XSF Discussion - 2020-08-21

  1. jonas’

    larma, lovetox, I am of the opinion that '394 is a dead-end and we should instead re-vive a different subset of XHTML, with clearer and stricter rules plus maybe a reference cleanup implementation in JavaScript.

  2. Seve


  3. MattJ

    I used to agree, but these days I see all the problems that come with allowing multiple representations of the body

  4. larma

    jonas’: why do you think so? I think 394 has great potential because it's extensible and has a properly defined fallback behavior. It's a good design and easy to implement. What's the issues you see?

  5. jonas’

    larma, I doubt the "easy to implement" part

  6. jonas’

    it has nasty corner cases (though counting itself is not one of them, since it’s clearly defined to count code points), e.g. what happens if a markup span ends right between two codepoints belonging to one emoji?

  7. jonas’

    I also don’t particularly like the fallback characters, they are bound to cause annoyances.

  8. lovetox

    jonas’, thats not a nasty corner case

  9. lovetox

    you try to find *something* that maybe does not work

  10. lovetox

    which never happens in real life

  11. jonas’

    lovetox, it may not be for an emoji, but I don’t dare to say it can’t cause problems in some scripts. Unicode is a strange thing.

  12. lovetox

    and even if, people would not see as "OMG i dont even can end a span between a emoji

  13. jonas’

    lovetox, yeah, that’s how you design robust things, not by saying "ah, that’s never going to happen!!"

  14. jonas’

    lovetox, of course, nobody will complain that in XHTML-IM, they can’t do onload="alert()". But if an attacker does, all hell breaks loose.

  15. lovetox

    btw i also would love a subset of xhtml

  16. lovetox

    but i guess everyone can already implement that

  17. lovetox

    there is no need for a new XEP?

  18. jonas’

    how can everyone implement that already?

  19. lovetox

    just implement only, a subset ...

  20. lovetox

    like everyone does already

  21. jonas’

    but which subset?

  22. jonas’

    ok, I don’t want to discuss this with you right now

  23. lovetox

    the one you care about

  24. lovetox

    thats why im asking, i can just trim my xhtml impl

  25. lovetox

    or do a totally new 394

  26. larma

    jonas’: iirc, the same issue of multi-codepoint emoji being split across multiple spans is not well defined in any popular markup system including HTML.

  27. larma

    So honestly that seems to be a non-issue. And I can hardly imagine a proper design that would have specific handling and not cause immense developer work as you'd have to tap into the font rendering library which you usually try not to do

  28. jonas’

    larma, it is irrelevant as long as there’s only a single possible representation of the text. However, with 394, there are two: one with markup applied and one without.

  29. jonas’

    if the goal of '394 is to avoid different meaning of text with and without markup applied, these corner cases need to be investigated

  30. dwd

    MattJ, I think the root problem is multiple displays of the body - that is, a message that can mean one thing to one person (or intermediary filtering system) and something else to another. I think that problem is way worse with XHTML-IM and similar, minimal with 393, and 394 represents a reasonable comprmise (assuming it gains the "this was markup tag"; without it should be minimal as well).

  31. MattJ

    dwd: that's basically my opinion too

  32. jonas’

    larma, can I convince you to take over '394? ;)

  33. larma

    I still have my work on "sims 2" and stickers pending ;)

  34. larma

    Oh, and reactions

  35. larma

    But beside that, I'd be OK to take it over

  36. jonas’

    larma, sims 2 the game?

  37. larma


  38. jonas’

    oh damn

  39. larma

    Sims as in 385

  40. jonas’


  41. larma


  42. jonas’

    I like most of that, though I do see some use for mixed content.

  43. jonas’

    larma, immediate feedback: - allow more than one <file-sharing/> element per message; - add an ID to each file-sharing element so that it can be referenced by future specifications.

  44. jonas’

    then it would be trivial to build a combination of that + 394 which allows inline images :)

  45. emus

    > http://larma.de/xeps/sfs.html#intro 👍

  46. LNJ

    larma: Great work, I really like the changes you made and especially the attaching of new sources. Have you thought about using message fastening for this? And another point that I was also missing in SIMS is that there is no example for including the thumbnail data (BoB would also allow communication of the data via IQs).

  47. LNJ

    +1 for allowing multiple files

  48. Daniel

    Looks good on first glance.

  49. Daniel

    Cleans up the issues I have with sims

  50. Daniel

    Will take a closer look later in the day

  51. Daniel

    Oh 2.3 is interesting. especially but not only in a group context. would be interesting to see if and how succesful that will actually get implemented

  52. Daniel

    but good downwards comptability to the x-oob+body method

  53. Daniel

    and will solve the weird standstill we currently have with SIMS

  54. larma

    jonas’: mixed content only makes sense for media files, you can't mix a random binary with normal text. So mixed content is intentionally out of scope there.

  55. dwd

    "Here's the PDF you wanted" - seems pretty useful to me.

  56. dwd

    The other problem that we run into is "Here is the current COVID-19 protocol" - we want later "hits" on that file identifier to get the latest version, ideally.

  57. larma

    dwd: you can still send message and file and files can have descriptions

  58. dwd

    Yes, you can (and that's what we do), but it means a search for the message doesn't naturally locate the file.

  59. larma

    If you use file description it could.

  60. dwd

    Yes, true. But that means the file description would end up being used as a message, if you're not careful. Depends very heavily on the UI; a file sharing extension on iOS, for example, is likely to end up using a description, whereas sharing a file inline to a chat is more likely to be showing a message+attachment kind of metaphor.

  61. larma

    Also searching for files is inherently a huge problem. For PDFs you'd want to actually search the content of the file as well. For pictures this would also be great but you'd need to OCR + image detection which is not easy.

  62. dwd

    Well, that's another problem entirely.

  63. dwd

    But anyway, I think overall, our greater problem is (essentially) versioning rather than the message+file case.

  64. larma

    I totally agree this XEP is not to solve all possible scenarios. It intentionally does not replace sims, but provides an alternative that is more like an evolution of what we are doing now (oob)

  65. larma

    Versioning to me seems like a total sidecase. You are typically also not versioning messages (even with lmc, you don't edit messages from weeks ago)

  66. larma

    If you need file versioning, oob seems like a sane approach to me

  67. dwd

    I know you're not versioning messages. But it would be nice if this could just share, essentially, a link in the same format.

  68. dwd

    Which ought to be a simple matter of having much of the file metadata element optional, and possibly a mechanism [given some stable identifier] of finding the current detailed metadata.

  69. larma

    My suggestion for versioning files if you want to stick with xmpp only would be to put the file metadata on a pubsub node and then just send a reference to that pubsub node in the message. You can update it (including getting a history) at any time and have a pointer to the actual file content (as http link) in it. Thereby you get proper versioning with possibility to fetch historic file versions.

  70. larma

    But that's totally overkill for the one shot file sharing I believe most users want.

  71. dwd

    Oh, sure, we *could*. But then the client has to be able to understand that.

  72. larma

    You are talking about a niche feature. It's not going to be implemented by all clients, no matter how bard you push, so better to keep the basic feature alone for those only interested in that.

  73. larma

    You are talking about a niche feature. It's not going to be implemented by all clients, no matter how hard you push, so better to keep the basic feature alone for those only interested in that.

  74. dwd

    Yeah, sure, but I'm suggesting that allowing some metadata to be optional means versioning is then supported (alongside named URL sharing, actually) but in a uniform format that will "just work" for most receiving clients.

  75. dwd

    So I don't *need* to push on receivers to support anything.

  76. larma

    Well, you'd need to get rid of the file hash at least which means the file is not authenticated through the message anymore (assuming the message is) and you also can't make use of the 2.3 feature

  77. jonas’

    larma, I always find it annoying when I have to write text messages separately from the media or files I send, since the file upload typically happens asynchronously.

  78. jonas’

    so I can’t know when my text message arrives related to the blob

  79. jonas’

    which is meh, also for the flow of reading on the receiving end

  80. !XSF_Martin

    Yeah, the way other messengers do it is nice.

  81. !XSF_Martin

    Where the image and your comment are one message.

  82. eevvoor

    ‎!XSF_Martin how do they do it exactly?

  83. eevvoor

    Are the technical details known?

  84. !XSF_Martin

    On the protocol level no idea. Threema and WhatsApp are closed source. Maybe signal does this too? Then one might have a look how they do it.

  85. MattJ

    It's not exactly rocket science to put text and a URL in a single message :)

  86. Zash

    Inline vs attached vs singletons?

  87. !XSF_Martin

    Probably they have some uploaded file url in some field and the text/body in another.

  88. dwd

    MattJ, Yes, but what about a series of hashes in multiple algorithms?

  89. dwd

    Also, yes, that problem of slow links and file/image uploads in a problem we have to solve as well.

  90. MattJ

    In MIX the messages are in the participants' archives... does the MIX server *also* keep an archive? Would clients ever query it?

  91. jonas’

    yes, always

  92. jonas’

    because we don’t have a reliable s2s sync protocol, so the user’s archive is bound to be incomplete.

  93. jonas’

    also for history before you joined

  94. MattJ

    So why store them in the user's archive then?

  95. jonas’

    no idea :)

  96. jonas’

    (that was a bit of a snark. Actually, having them in the user’s archive is very convenient for the user and we should maybe see if we can fix the s2s sync)

  97. dwd

    MattJ, Yes, for example if a MIX channel were used as a pager replacement, someone newly allocated to the pager will query the archive to find previous messages in a conversation.

  98. MattJ

    Sounds simple™

  99. MattJ

    from a client perspective

  100. dwd

    Dealing with S2S sync is certainly a good problem to tackle, BTW.

  101. Zash

    S2S MAM?

  102. jonas’

    for certain definitions of "good"

  103. Zash

    "sync"... ugh

  104. dwd

    Well, reliability.

  105. dwd

    I think we don't want to be tackling more than dropped connections. Sustained disconnected-mode operation for S2S isn't something that's worth tackling - things like FMUC handle those kinds of cases.

  106. Zash

    So do we need more than s2s stream management?

  107. dwd

    Zash, Probably not, actually.

  108. jonas’

    Zash, yes, because you’ll miss messages while the server is restarting

  109. jonas’

    unless you can persist s2s state && the remote will hold their state long enough so that your kernel upgrade + 10 minute reboot time is covered

  110. dwd

    jonas’, So do we need an outgoing buffer and retry semantics rather than bound-with-error?

  111. dwd

    jonas’, So do we need an outgoing buffer and retry semantics rather than bounce-with-error?

  112. jonas’

    dwd, I don’t think that’d be a good idea

  113. jonas’

    maybe s2s SM + pull-based MAM sync from MIX-enabled user-servers after a server restart would be sufficient?

  114. Zash

    MUX-side delivery tracking ?

  115. Ge0rG

    what about a thing that's somewhere in between 0198 and MAM; keep a queue of "important" stanzas, use unique IDs instead of counter values, resync after session setup.

  116. jonas’

    dwd, problem with the outbound queues is their size will be limited at some point, pushing the problem only back. A highly active MIX could exhaust that limit during your kernel upgrade.

  117. jonas’

    though SM and outbound queues are still problematic anyways...

  118. jonas’

    so in fact, a server would have to do pull-based MAM sync whenever it is not able to SM-resume with the other side, no matter the reason.

  119. jonas’

    look, that sounds like what clients do!

  120. Zash

    MIX is server side MUC?

  121. dwd

    jonas’, For all its users?

  122. jonas’

    and then the server would have to both sync the messages into MAM *and* replay them live to already-connected clients (which may have synced with the *local* MAM already and think they’re up-to-date) *and* queue and delay any *live* messages from the MIX so that everything arrives in order

  123. dwd

    jonas’, And you think this is beter scaling than a 0198 queue?

  124. jonas’

    dwd, I think this serves a different purpose than a '198 queue

  125. dwd

    jonas’, Is that purpose to make everyone's life harder? If so, mission accomplished. :-)

  126. jonas’

    dwd, the purpose is to achieve reliable message delivery

  127. jonas’

    I’m not sure how you’re going to achieve that with '198 alone. It hasn’t sufficed for c2s, it won’t suffice for s2s either.

  128. Zash

    Ugh, sync :(

  129. jonas’

    dwd, there are two key guarantees which need to be held which make this very hard: - In-order message delivery from the MIX to the client - No insertions into the middle of the user’s MAM archive

  130. Ge0rG

    Can't we just have forever-persistent 0198?

  131. jonas’

    and this is why Ge0rG (sorry for putting words in your mouth again) and I have been saying that the user’s local archive is a terrible idea and only going to cause pain.

  132. Zash

    Can't we just embrace fast delivery or failure notification?

  133. MattJ

    Notify the MIX that delivery failed

  134. Ge0rG

    Zash: there was a one-message thread on message errors some time ago...

  135. Ge0rG

    MattJ: and then the MIX can kick the user out! Win-win!

  136. MattJ

    Sounds like a plan

  137. Ge0rG

    And when the user rejoins, they just do a full sync!

  138. Zash

    Message attachments in the form of delivery statuses?

  139. MattJ

    I knew MIX could solve all the problems of MUC in the simplest possible way

  140. Ge0rG

    Or you just add a tombstone to all local user archives whenever s2s fails

  141. dwd

    Ge0rG, Put in a tombstone for every missed message. Seems legit.

  142. Ge0rG

    dwd: but you don't know which / how many messages you missed!

  143. dwd

    Ge0rG, You would if you had tombstones.

  144. jonas’

    dwd, how is the server to know how many messages it missed?

  145. dwd

    jonas’, Because of the tombstones. I fail to see how you can fault my logic here.

  146. Ge0rG

    dwd: aaah, right! The tombstones! It's obvious to me now!

  147. jonas’

    sorry, my sarcasmometer is out-of-service due to the heatwave

  148. dwd

    jonas’, Storm Ellen here, my smilies have been blown away.

  149. Ge0rG

    surely just a random weather phenomenon not related in any way to ocean heating or the CO2 amounts in the atmosphere

  150. jonas’

    Ge0rG, surely.

  151. jonas’

    Ge0rG, ThiS iS a SAfE sPaCe gO AWaY wiTH yOUr ClIMAtE ChaNGE IdeOLogY!

  152. jonas’

    s/ChaNGE/CaTAsTrOPHy/, too

  153. Ge0rG


  154. Holger

    > and this is why Ge0rG (sorry for putting words in your mouth again) and I have been saying that the user’s local archive is a terrible idea and only going to cause pain. I'm saying that all day long (can't remember saying anything else since I first heard of MIX) and would still very much prefer to keep this feature optional.

  155. Ge0rG

    Holger: but you are not the XEP author.

  156. MattJ

    Still, if we have community consensus that this design is flawed, we can still change it, right? If the MIX has an archive anyway, clients just need to query that instead

  157. MattJ


  158. MattJ

    Relatedly: https://framapiaf.org/@debacle/104713005724817353 (thanks debacle)

  159. Kev

    The only reason for MIX to need to be in the user's archive is for search.

  160. Kev

    (From memory)

  161. Kev

    Well, bandwidth too, but mostly search.

  162. MattJ

    If that's the case, I'm happy with that

  163. Holger

    MattJ: At least during the Summit it seemed to me the consensus is to have user archives. But yes I'd think we should just have clients check a feature (IIRC there is one in MIX-PAM) to decide which archive to query. Increases complexity on the client side which I'm usually all for avoiding at all cost, but seems the least evil to me in this case.

  164. Kev

    If someone could come up with a decent solution to search it would be nice to drop it.

  165. Kev

    (Ok, I think I'm up to three reasons - search, bandwith, and persistence)

  166. MattJ

    Yeah, I think I'd rather tackle the search problem than turn the current architecture on its head and face a whole set of other problems

  167. Holger

    Kev: I also remember scalability arguments, i.e. the case where the client is joined to thousands of rooms.

  168. Holger

    And persistence, yes.

  169. Holger

    I get all that but I think we'd need to solve a couple of problems before at least I would be able to implement user archives. And it would be nice if that wouldn't block MIX.

  170. Kev

    You really don't want to be in a room that keeps history for a day, come back two days later and not see the replies to your messages.

  171. Kev

    It's one of those 'no ideal solutions' things, I think, that just hurts because of federated architectures.

  172. Holger

    Right now the user server would either duplicate to death or have to implement black dedup magic. Plus the sync issues.

  173. Kev

    User archives seemed like the least bad solution.

  174. jonas’

    Kev, so a user archive which seems complete, but has gaps nobody knows about is better than an archive which tells you "sorry, I don’t have your newest message, there is a gap here"

  175. Holger

    So I think right now it's a horrible combination of the downsides of Matrix with those of XMPP.

  176. jonas’

    that logic seems flawed to me

  177. Holger

    I think.

  178. Kev

    jonas’: Dear Strawman, love...?

  179. eta

    so, the thing I never got with XEP-0045 is

  180. jonas’

    Kev, I can’t follow, sorry

  181. jonas’

    my english fails me

  182. Kev

    jonas’: You're presenting a position that isn't mine, then arguing that it's wrong, and therefore I am.

  183. eta

    why can't the server just keep a log of what was said after a resource left, persist that, and then replay it as join history?

  184. jonas’

    Kev, sorry, not my entention, maybe I missed something

  185. Kev

    I don't think user archives should have gaps in them, which is why we needed the sync logic between MIX and User archive.

  186. Holger

    eta: It could. MAM is just nicer because the paging.

  187. jonas’

    Kev, ok, I missed that, sorry

  188. jonas’

    ignore me :)

  189. jonas’

    ignore me & carry on :)

  190. Kev

    It was a discussion at the Summit about how we needed to ensure we could detect holes in the user's view of the MIX archive, and plug them.

  191. MattJ

    eta, paging and tracking is harder than you think (usually people "leave" the MUC long after they lost their connection)

  192. Kev

    Maybe it was a side-room discussion, I forget at this point.

  193. jonas’

    I might’ve missed that discussion :/

  194. eta

    MattJ, ah right, reasonable enough

  195. jonas’

    eta, also unbounded storage requirements on the server side. What if the resource never comes back?

  196. eta

    jonas’, well I guess you'd need some cap

  197. eta

    but nvm, I'm convinced MAM is probably ideal

  198. jonas’

    eta, also, MAM-MUC is pretty much that, except that the client has to explicitly ask ;)

  199. MattJ

    Just like with MAM. In fact Prosody (and I'd be surprised if not other servers) uses the MAM archive to fulfil MUC history requests these days

  200. Holger

    BTW 0369, 7.2.1 says the user's server MUST archive, 7.2.2 says it MAY?

  201. dwd

    Holger, Take the average, it's a SHOULD.

  202. Holger


  203. dwd

    Although if it says MUST twice and MAY once, it's a REALLY SHOULD.

  204. Holger

    Makes sense.

  205. Holger

    I'm okay with MUST archive as long as I MAY apply an arbitrary expiry period of 0 or more seconds.

  206. jonas’


  207. jonas’

    Holger, you are aware that you can’t do that as a user server, right?

  208. jonas’

    not without purging all the other messages too

  209. jonas’

    due to how MAM works ;)

  210. Holger

    Well with our specific implementation (stanza ID being a timestamp) things wouldn't break.

  211. jonas’

    holes in MAM are forbidden

  212. Holger

    Yes my statement was just that "things wouldn't break".

  213. jonas’

    only with an expiry of exactly 0s implemented by not visibly storing the messages

  214. jonas’

    otherwise you have to violate '313 by not returning the correct response for an ID not in your archive.

  215. jonas’

    > If any UID requested by the client in any of the 'before-id', 'after-id' or 'ids' form fields is not present in the archive, the server MUST return an item-not-found error in response to the query.

  216. Holger

    Yes that's what I'm doing.

  217. jonas’

    Shame on you! Violating a MUST!

  218. jonas’

    (also, yes it is friday)

  219. Holger

    Wasn't in earlier revisions IIRC, and isn't in 0059. We have generic 0059 code to do 0059 with MAM and other things, and I'm not keen on "if MAM then this else that" special casing.

  220. jonas’

    nothing to do with '59 that bit of text

  221. jonas’

    gotta go now tho

  222. Holger

    Not sure how to parse that. 0059 says what to do if a UID requested is not present in the archive. 0313 says something else on the same topic.

  223. Holger

    Generic implementation is one reasoning, the other is avoiding the additional SQL query on each and every MAM request.

  224. MattJ

    The error is to allow clients to detect gaps in the sync

  225. MattJ

    if that's not obvious

  226. Holger

    I understand the reasoning but that doesn't make me like the solution ;-)

  227. Guus

    Does anyone know an administrator for jabber.cz?

  228. Guus

    They might want to review their user using smash55 for its username

  229. dwd

    What would things look like iof MIX messages did not go into the user's archive ever, and instead were fetched on demand - could this be managed by the server or would it have to be managed by the client?

  230. Guus

    dwd: re your question on twitter: try Greg and Dele

  231. !XSF_Martin

    Guus: What's the issue with that useername?

  232. Guus

    !XSF_Martin: it is the point of contact that's advertised in spam messages

  233. !XSF_Martin

    Ah I misinterpreted > review their user … for its username So I thought the username is the problem itself. Sorry, not a native speaker.

  234. dwd

    Guus, Oh, good call. Though i think that's maybe the wrong bit of BT entirely.

  235. Guus

    dwd: could be, but I believe they've both moved around

  236. Guus

    Worth a shot