XSF Discussion - 2020-08-21

jonas’ 05:29:24
larma, lovetox, I am of the opinion that '394 is a dead-end and we should instead re-vive a different subset of XHTML, with clearer and stricter rules plus maybe a reference cleanup implementation in JavaScript.
Seve 05:54:25
+1
MattJ 06:02:17
I used to agree, but these days I see all the problems that come with allowing multiple representations of the body
larma 06:39:20
jonas’: why do you think so? I think 394 has great potential because it's extensible and has a properly defined fallback behavior. It's a good design and easy to implement. What's the issues you see?
jonas’ 06:41:00
larma, I doubt the "easy to implement" part
jonas’ 06:41:58
it has nasty corner cases (though counting itself is not one of them, since it’s clearly defined to count code points), e.g. what happens if a markup span ends right between two codepoints belonging to one emoji?
jonas’ 06:42:15
I also don’t particularly like the fallback characters, they are bound to cause annoyances.
lovetox 06:42:43
jonas’, thats not a nasty corner case
lovetox 06:42:54
you try to find *something* that maybe does not work
lovetox 06:42:59
which never happens in real life
jonas’ 06:43:03
lovetox, it may not be for an emoji, but I don’t dare to say it can’t cause problems in some scripts. Unicode is a strange thing.
lovetox 06:43:15
and even if, people would not see as "OMG i dont even can end a span between a emoji
jonas’ 06:43:20
lovetox, yeah, that’s how you design robust things, not by saying "ah, that’s never going to happen!!"
jonas’ 06:43:53
lovetox, of course, nobody will complain that in XHTML-IM, they can’t do onload="alert()". But if an attacker does, all hell breaks loose.
lovetox 06:43:53
btw i also would love a subset of xhtml
lovetox 06:44:00
but i guess everyone can already implement that
lovetox 06:44:05
there is no need for a new XEP?
jonas’ 06:44:11
how can everyone implement that already?
lovetox 06:44:28
just implement only, a subset ...
lovetox 06:44:32
like everyone does already
jonas’ 06:44:34
but which subset?
jonas’ 06:44:43
ok, I don’t want to discuss this with you right now
lovetox 06:44:44
the one you care about
lovetox 06:45:17
thats why im asking, i can just trim my xhtml impl
lovetox 06:45:26
or do a totally new 394
larma 06:50:58
jonas’: iirc, the same issue of multi-codepoint emoji being split across multiple spans is not well defined in any popular markup system including HTML.
larma 06:56:43
So honestly that seems to be a non-issue. And I can hardly imagine a proper design that would have specific handling and not cause immense developer work as you'd have to tap into the font rendering library which you usually try not to do
jonas’ 08:21:45
larma, it is irrelevant as long as there’s only a single possible representation of the text. However, with 394, there are two: one with markup applied and one without.
jonas’ 08:22:13
if the goal of '394 is to avoid different meaning of text with and without markup applied, these corner cases need to be investigated
dwd 08:22:23
MattJ, I think the root problem is multiple displays of the body - that is, a message that can mean one thing to one person (or intermediary filtering system) and something else to another. I think that problem is way worse with XHTML-IM and similar, minimal with 393, and 394 represents a reasonable comprmise (assuming it gains the "this was markup tag"; without it should be minimal as well).
MattJ 08:28:56
dwd: that's basically my opinion too
jonas’ 08:29:23
larma, can I convince you to take over '394? ;)
larma 08:32:14
I still have my work on "sims 2" and stickers pending ;)
larma 08:32:24
Oh, and reactions
larma 08:32:45
But beside that, I'd be OK to take it over
jonas’ 08:32:58
larma, sims 2 the game?
larma 08:33:03
No
jonas’ 08:33:06
oh damn
larma 08:33:35
Sims as in 385
jonas’ 08:33:44
right
larma 08:36:22
http://larma.de/xeps/sfs.html#intro
jonas’ 08:38:24
I like most of that, though I do see some use for mixed content.
jonas’ 08:39:28
larma, immediate feedback: - allow more than one <file-sharing/> element per message; - add an ID to each file-sharing element so that it can be referenced by future specifications.
jonas’ 08:39:39
then it would be trivial to build a combination of that + 394 which allows inline images :)
emus 08:41:49
> http://larma.de/xeps/sfs.html#intro 👍
LNJ 08:48:53
larma: Great work, I really like the changes you made and especially the attaching of new sources. Have you thought about using message fastening for this? And another point that I was also missing in SIMS is that there is no example for including the thumbnail data (BoB would also allow communication of the data via IQs).
LNJ 08:49:18
+1 for allowing multiple files
Daniel 08:51:55
Looks good on first glance.
Daniel 08:52:09
Cleans up the issues I have with sims
Daniel 08:52:19
Will take a closer look later in the day
Daniel 08:56:38
Oh 2.3 is interesting. especially but not only in a group context. would be interesting to see if and how succesful that will actually get implemented
Daniel 08:57:44
but good downwards comptability to the x-oob+body method
Daniel 08:57:59
and will solve the weird standstill we currently have with SIMS
larma 09:18:02
jonas’: mixed content only makes sense for media files, you can't mix a random binary with normal text. So mixed content is intentionally out of scope there.
dwd 09:19:27
"Here's the PDF you wanted" - seems pretty useful to me.
dwd 09:20:46
The other problem that we run into is "Here is the current COVID-19 protocol" - we want later "hits" on that file identifier to get the latest version, ideally.
larma 09:21:47
dwd: you can still send message and file and files can have descriptions
dwd 09:22:30
Yes, you can (and that's what we do), but it means a search for the message doesn't naturally locate the file.
larma 09:23:07
If you use file description it could.
dwd 09:24:45
Yes, true. But that means the file description would end up being used as a message, if you're not careful. Depends very heavily on the UI; a file sharing extension on iOS, for example, is likely to end up using a description, whereas sharing a file inline to a chat is more likely to be showing a message+attachment kind of metaphor.
larma 09:25:37
Also searching for files is inherently a huge problem. For PDFs you'd want to actually search the content of the file as well. For pictures this would also be great but you'd need to OCR + image detection which is not easy.
dwd 09:26:07
Well, that's another problem entirely.
dwd 09:26:35
But anyway, I think overall, our greater problem is (essentially) versioning rather than the message+file case.
larma 09:26:42
I totally agree this XEP is not to solve all possible scenarios. It intentionally does not replace sims, but provides an alternative that is more like an evolution of what we are doing now (oob)
larma 09:27:46
Versioning to me seems like a total sidecase. You are typically also not versioning messages (even with lmc, you don't edit messages from weeks ago)
larma 09:28:04
If you need file versioning, oob seems like a sane approach to me
dwd 09:29:47
I know you're not versioning messages. But it would be nice if this could just share, essentially, a link in the same format.
dwd 09:31:46
Which ought to be a simple matter of having much of the file metadata element optional, and possibly a mechanism [given some stable identifier] of finding the current detailed metadata.
larma 09:33:30
My suggestion for versioning files if you want to stick with xmpp only would be to put the file metadata on a pubsub node and then just send a reference to that pubsub node in the message. You can update it (including getting a history) at any time and have a pointer to the actual file content (as http link) in it. Thereby you get proper versioning with possibility to fetch historic file versions.
larma 09:34:17
But that's totally overkill for the one shot file sharing I believe most users want.
dwd 09:34:30
Oh, sure, we *could*. But then the client has to be able to understand that.
larma 09:35:45
~~You are talking about a niche feature. It's not going to be implemented by all clients, no matter how bard you push, so better to keep the basic feature alone for those only interested in that.~~ ✎
larma 09:35:56
You are talking about a niche feature. It's not going to be implemented by all clients, no matter how hard you push, so better to keep the basic feature alone for those only interested in that. ✏
dwd 09:40:09
Yeah, sure, but I'm suggesting that allowing some metadata to be optional means versioning is then supported (alongside named URL sharing, actually) but in a uniform format that will "just work" for most receiving clients.
dwd 09:40:33
So I don't *need* to push on receivers to support anything.
larma 09:48:28
Well, you'd need to get rid of the file hash at least which means the file is not authenticated through the message anymore (assuming the message is) and you also can't make use of the 2.3 feature
jonas’ 09:58:54
larma, I always find it annoying when I have to write text messages separately from the media or files I send, since the file upload typically happens asynchronously.
jonas’ 09:59:03
so I can’t know when my text message arrives related to the blob
jonas’ 09:59:11
which is meh, also for the flow of reading on the receiving end
!XSF_Martin 09:59:44
Yeah, the way other messengers do it is nice.
!XSF_Martin 10:00:14
Where the image and your comment are one message.
eevvoor 10:00:25
‎!XSF_Martin how do they do it exactly?
eevvoor 10:01:14
Are the technical details known?
!XSF_Martin 10:01:19
On the protocol level no idea. Threema and WhatsApp are closed source. Maybe signal does this too? Then one might have a look how they do it.
MattJ 10:01:43
It's not exactly rocket science to put text and a URL in a single message :)
Zash 10:02:15
Inline vs attached vs singletons?
!XSF_Martin 10:02:15
Probably they have some uploaded file url in some field and the text/body in another.
dwd 10:08:44
MattJ, Yes, but what about a series of hashes in multiple algorithms?
dwd 10:09:38
Also, yes, that problem of slow links and file/image uploads in a problem we have to solve as well.
MattJ 10:55:58
In MIX the messages are in the participants' archives... does the MIX server *also* keep an archive? Would clients ever query it?
jonas’ 10:56:08
yes, always
jonas’ 10:56:19
because we don’t have a reliable s2s sync protocol, so the user’s archive is bound to be incomplete.
jonas’ 10:56:23
also for history before you joined
MattJ 10:56:29
So why store them in the user's archive then?
jonas’ 10:56:34
no idea :)
jonas’ 10:57:03
(that was a bit of a snark. Actually, having them in the user’s archive is very convenient for the user and we should maybe see if we can fix the s2s sync)
dwd 10:57:09
MattJ, Yes, for example if a MIX channel were used as a pager replacement, someone newly allocated to the pager will query the archive to find previous messages in a conversation.
MattJ 10:58:04
Sounds simple™
MattJ 10:58:16
from a client perspective
dwd 10:59:38
Dealing with S2S sync is certainly a good problem to tackle, BTW.
Zash 11:00:34
S2S MAM?
jonas’ 11:00:36
for certain definitions of "good"
Zash 11:02:52
"sync"... ugh
dwd 11:03:05
Well, reliability.
dwd 11:04:02
I think we don't want to be tackling more than dropped connections. Sustained disconnected-mode operation for S2S isn't something that's worth tackling - things like FMUC handle those kinds of cases.
Zash 11:09:52
So do we need more than s2s stream management?
dwd 11:10:04
Zash, Probably not, actually.
jonas’ 11:10:09
Zash, yes, because you’ll miss messages while the server is restarting
jonas’ 11:10:35
unless you can persist s2s state && the remote will hold their state long enough so that your kernel upgrade + 10 minute reboot time is covered
dwd 11:10:55
~~jonas’, So do we need an outgoing buffer and retry semantics rather than bound-with-error?~~ ✎
dwd 11:11:00
jonas’, So do we need an outgoing buffer and retry semantics rather than bounce-with-error? ✏
jonas’ 11:11:15
dwd, I don’t think that’d be a good idea
jonas’ 11:11:37
maybe s2s SM + pull-based MAM sync from MIX-enabled user-servers after a server restart would be sufficient?
Zash 11:11:42
MUX-side delivery tracking ?
Ge0rG 11:12:13
what about a thing that's somewhere in between 0198 and MAM; keep a queue of "important" stanzas, use unique IDs instead of counter values, resync after session setup.
jonas’ 11:12:13
dwd, problem with the outbound queues is their size will be limited at some point, pushing the problem only back. A highly active MIX could exhaust that limit during your kernel upgrade.
jonas’ 11:12:58
though SM and outbound queues are still problematic anyways...
jonas’ 11:13:17
so in fact, a server would have to do pull-based MAM sync whenever it is not able to SM-resume with the other side, no matter the reason.
jonas’ 11:13:20
look, that sounds like what clients do!
Zash 11:13:44
MIX is server side MUC?
dwd 11:13:53
jonas’, For all its users?
jonas’ 11:14:05
and then the server would have to both sync the messages into MAM *and* replay them live to already-connected clients (which may have synced with the *local* MAM already and think they’re up-to-date) *and* queue and delay any *live* messages from the MIX so that everything arrives in order
dwd 11:14:08
jonas’, And you think this is beter scaling than a 0198 queue?
jonas’ 11:14:18
dwd, I think this serves a different purpose than a '198 queue
dwd 11:14:42
jonas’, Is that purpose to make everyone's life harder? If so, mission accomplished. :-)
jonas’ 11:15:02
dwd, the purpose is to achieve reliable message delivery
jonas’ 11:15:16
I’m not sure how you’re going to achieve that with '198 alone. It hasn’t sufficed for c2s, it won’t suffice for s2s either.
Zash 11:16:40
Ugh, sync :(
jonas’ 11:17:42
dwd, there are two key guarantees which need to be held which make this very hard: - In-order message delivery from the MIX to the client - No insertions into the middle of the user’s MAM archive
Ge0rG 11:17:55
Can't we just have forever-persistent 0198?
jonas’ 11:18:05
and this is why Ge0rG (sorry for putting words in your mouth again) and I have been saying that the user’s local archive is a terrible idea and only going to cause pain.
Zash 11:18:20
Can't we just embrace fast delivery or failure notification?
MattJ 11:18:31
Notify the MIX that delivery failed
Ge0rG 11:18:41
Zash: there was a one-message thread on message errors some time ago...
Ge0rG 11:18:55
MattJ: and then the MIX can kick the user out! Win-win!
MattJ 11:19:02
Sounds like a plan
Ge0rG 11:19:16
And when the user rejoins, they just do a full sync!
Zash 11:19:28
Message attachments in the form of delivery statuses?
MattJ 11:19:33
I knew MIX could solve all the problems of MUC in the simplest possible way
Ge0rG 11:19:34
Or you just add a tombstone to all local user archives whenever s2s fails
dwd 11:20:01
Ge0rG, Put in a tombstone for every missed message. Seems legit.
Ge0rG 11:20:30
dwd: but you don't know which / how many messages you missed!
dwd 11:21:00
Ge0rG, You would if you had tombstones.
jonas’ 11:21:12
dwd, how is the server to know how many messages it missed?
dwd 11:21:27
jonas’, Because of the tombstones. I fail to see how you can fault my logic here.
Ge0rG 11:21:57
dwd: aaah, right! The tombstones! It's obvious to me now!
jonas’ 11:22:33
sorry, my sarcasmometer is out-of-service due to the heatwave
dwd 11:24:05
jonas’, Storm Ellen here, my smilies have been blown away.
Ge0rG 11:26:10
surely just a random weather phenomenon not related in any way to ocean heating or the CO2 amounts in the atmosphere
jonas’ 11:26:18
Ge0rG, surely.
jonas’ 11:27:15
Ge0rG, ThiS iS a SAfE sPaCe gO AWaY wiTH yOUr ClIMAtE ChaNGE IdeOLogY!
jonas’ 11:27:41
s/ChaNGE/CaTAsTrOPHy/, too
Ge0rG 11:28:25
</trigger-warning>
Holger 14:14:49
> and this is why Ge0rG (sorry for putting words in your mouth again) and I have been saying that the user’s local archive is a terrible idea and only going to cause pain. I'm saying that all day long (can't remember saying anything else since I first heard of MIX) and would still very much prefer to keep this feature optional.
Ge0rG 14:15:45
Holger: but you are not the XEP author.
MattJ 15:00:03
Still, if we have community consensus that this design is flawed, we can still change it, right? If the MIX has an archive anyway, clients just need to query that instead
MattJ 15:00:12
...right...?
MattJ 15:01:25
Relatedly: https://framapiaf.org/@debacle/104713005724817353 (thanks debacle)
Kev 15:12:08
The only reason for MIX to need to be in the user's archive is for search.
Kev 15:12:19
(From memory)
Kev 15:12:32
Well, bandwidth too, but mostly search.
MattJ 15:12:33
If that's the case, I'm happy with that
Holger 15:12:41
MattJ: At least during the Summit it seemed to me the consensus is to have user archives. But yes I'd think we should just have clients check a feature (IIRC there is one in MIX-PAM) to decide which archive to query. Increases complexity on the client side which I'm usually all for avoiding at all cost, but seems the least evil to me in this case.
Kev 15:12:44
If someone could come up with a decent solution to search it would be nice to drop it.
Kev 15:13:19
(Ok, I think I'm up to three reasons - search, bandwith, and persistence)
MattJ 15:13:28
Yeah, I think I'd rather tackle the search problem than turn the current architecture on its head and face a whole set of other problems
Holger 15:13:29
Kev: I also remember scalability arguments, i.e. the case where the client is joined to thousands of rooms.
Holger 15:13:37
And persistence, yes.
Holger 15:14:34
I get all that but I think we'd need to solve a couple of problems before at least I would be able to implement user archives. And it would be nice if that wouldn't block MIX.
Kev 15:14:34
You really don't want to be in a room that keeps history for a day, come back two days later and not see the replies to your messages.
Kev 15:15:27
It's one of those 'no ideal solutions' things, I think, that just hurts because of federated architectures.
Holger 15:15:43
Right now the user server would either duplicate to death or have to implement black dedup magic. Plus the sync issues.
Kev 15:15:44
User archives seemed like the least bad solution.
jonas’ 15:16:05
Kev, so a user archive which seems complete, but has gaps nobody knows about is better than an archive which tells you "sorry, I don’t have your newest message, there is a gap here"
Holger 15:16:08
So I think right now it's a horrible combination of the downsides of Matrix with those of XMPP.
jonas’ 15:16:11
that logic seems flawed to me
Holger 15:16:13
I think.
Kev 15:16:21
jonas’: Dear Strawman, love...?
eta 15:16:36
so, the thing I never got with XEP-0045 is
jonas’ 15:16:36
Kev, I can’t follow, sorry
jonas’ 15:16:39
my english fails me
Kev 15:16:55
jonas’: You're presenting a position that isn't mine, then arguing that it's wrong, and therefore I am.
eta 15:17:04
why can't the server just keep a log of what was said after a resource left, persist that, and then replay it as join history?
jonas’ 15:17:20
Kev, sorry, not my entention, maybe I missed something
Kev 15:17:21
I don't think user archives should have gaps in them, which is why we needed the sync logic between MIX and User archive.
Holger 15:17:33
eta: It could. MAM is just nicer because the paging.
jonas’ 15:17:35
Kev, ok, I missed that, sorry
jonas’ 15:17:38
~~ignore me :)~~ ✎
jonas’ 15:17:44
ignore me & carry on :) ✏
Kev 15:17:56
It was a discussion at the Summit about how we needed to ensure we could detect holes in the user's view of the MIX archive, and plug them.
MattJ 15:18:05
eta, paging and tracking is harder than you think (usually people "leave" the MUC long after they lost their connection)
Kev 15:18:08
Maybe it was a side-room discussion, I forget at this point.
jonas’ 15:18:10
I might’ve missed that discussion :/
eta 15:18:17
MattJ, ah right, reasonable enough
jonas’ 15:18:48
eta, also unbounded storage requirements on the server side. What if the resource never comes back?
eta 15:19:08
jonas’, well I guess you'd need some cap
eta 15:19:18
but nvm, I'm convinced MAM is probably ideal
jonas’ 15:19:28
eta, also, MAM-MUC is pretty much that, except that the client has to explicitly ask ;)
MattJ 15:19:32
Just like with MAM. In fact Prosody (and I'd be surprised if not other servers) uses the MAM archive to fulfil MUC history requests these days
Holger 15:24:44
BTW 0369, 7.2.1 says the user's server MUST archive, 7.2.2 says it MAY?
dwd 15:46:38
Holger, Take the average, it's a SHOULD.
Holger 15:46:49
🙂
dwd 15:47:11
Although if it says MUST twice and MAY once, it's a REALLY SHOULD.
Holger 15:47:22
Makes sense.
Holger 15:48:17
I'm okay with MUST archive as long as I MAY apply an arbitrary expiry period of 0 or more seconds.
jonas’ 15:49:09
:>
jonas’ 15:49:37
Holger, you are aware that you can’t do that as a user server, right?
jonas’ 15:49:42
not without purging all the other messages too
jonas’ 15:49:45
due to how MAM works ;)
Holger 15:51:01
Well with our specific implementation (stanza ID being a timestamp) things wouldn't break.
jonas’ 15:51:35
holes in MAM are forbidden
Holger 15:53:27
Yes my statement was just that "things wouldn't break".
jonas’ 15:53:50
only with an expiry of exactly 0s implemented by not visibly storing the messages
jonas’ 15:54:20
otherwise you have to violate '313 by not returning the correct response for an ID not in your archive.
jonas’ 15:54:32
> If any UID requested by the client in any of the 'before-id', 'after-id' or 'ids' form fields is not present in the archive, the server MUST return an item-not-found error in response to the query.
Holger 15:54:33
Yes that's what I'm doing.
jonas’ 15:55:07
Shame on you! Violating a MUST!
jonas’ 15:55:13
(also, yes it is friday)
Holger 15:55:32
Wasn't in earlier revisions IIRC, and isn't in 0059. We have generic 0059 code to do 0059 with MAM and other things, and I'm not keen on "if MAM then this else that" special casing.
jonas’ 15:57:30
nothing to do with '59 that bit of text
jonas’ 15:57:43
gotta go now tho
Holger 15:59:10
Not sure how to parse that. 0059 says what to do if a UID requested is not present in the archive. 0313 says something else on the same topic.
Holger 16:00:10
Generic implementation is one reasoning, the other is avoiding the additional SQL query on each and every MAM request.
MattJ 16:04:50
The error is to allow clients to detect gaps in the sync
MattJ 16:04:54
if that's not obvious
Holger 16:05:50
I understand the reasoning but that doesn't make me like the solution ;-)
Guus 16:18:44
Does anyone know an administrator for jabber.cz?
Guus 16:19:43
They might want to review their user using smash55 for its username
dwd 16:19:49
What would things look like iof MIX messages did not go into the user's archive ever, and instead were fetched on demand - could this be managed by the server or would it have to be managed by the client?
Guus 16:20:54
dwd: re your question on twitter: try Greg and Dele
!XSF_Martin 16:26:17
Guus: What's the issue with that useername?
Guus 16:27:36
!XSF_Martin: it is the point of contact that's advertised in spam messages
!XSF_Martin 16:28:41
Ah I misinterpreted > review their user … for its username So I thought the username is the problem itself. Sorry, not a native speaker.
dwd 16:28:53
Guus, Oh, good call. Though i think that's maybe the wrong bit of BT entirely.
Guus 16:30:20
dwd: could be, but I believe they've both moved around
Guus 16:30:30
Worth a shot