jdev - 2020-08-11


  1. Zash

    mod_smacks did have such limits for a short time, but they caused exactly this problem and were then removed until someone can come up with a better way to deal with it

  2. Ge0rG

    was that the thing that made my server explode due to an unbound smacks queue? ;)

  3. Zash

    I got the impression that yesterdays discussion was about the opposite problem, killing the session once the queue is too large

  4. Zash

    So, no, not that issue.

  5. lovetox

    it turned out the user which reported that problem had a queue size of 1000

  6. Ge0rG

    what lovetox describes sounds like a case of too much burst traffic, not of socket synchronization issues

  7. lovetox

    the current ejabberd default is much higher

  8. lovetox

    but a few versions ago it was 1000

  9. Ge0rG

    if you join a dozen MUCs at the same time, you might well run into a 1000 stanza limit

  10. lovetox

    1000 is like nothing

  11. lovetox

    you cant even join one irc room like #ubuntu or #python

  12. Ge0rG

    lovetox: only join MUCs once at a time ;)

  13. lovetox

    you get instantly disconnected

  14. Ge0rG

    Matrix HQ

  15. Ge0rG

    unless the bridge is down ;)

  16. lovetox

    the current ejabberd default is 5000

  17. lovetox

    which until now works ok

  18. Ge0rG

    I'm sure the Matrix HQ has more users. But maybe it's slow enough in pushing them over the bridge that you can fetch them from your server before you are killed

  19. Kev

    Ge0rG: Oh, you're suggesting that it's a kill based on timing out an ack because the server is ignoring that it's reading stuff from the socket that's acking old stanzas, rather than timing out on data not being received?

  20. Kev

    That seems plausible.

  21. Ge0rG

    Kev: I'm not sure it has to do with the server's reading side of the c2s socket at all

  22. Holger <- still trying to parse that sentence :-)

  23. Zash

    Dunno how ejabberd works but that old mod_smacks version just killed the session once it hit n queued stanzas.

  24. Ge0rG

    Holger: I'm not sure I underestood it either

  25. Ge0rG

    Kev: I think it's about N stanzas suddenly arriving for a client, with N being larger than the maximum queue size

  26. Holger

    Zash: That's how ejabberd works. Yes that's problematic, but doing nothing is even more so, and so far I see no better solution.

  27. Kev

    Ah. I understand now, yes.

  28. Ge0rG

    Holger: you could add a time-based component to that, i.e. allow short bursts to exceed the limit

  29. Ge0rG

    give the client a chance to consume the burst and to ack it

  30. Holger

    I mean if you do nothing it's absolutely trivial to OOM-kill the entire server.

  31. Ge0rG

    Holger: BTDT

  32. Zash

    Ge0rG: Wasn't that a feedback loop tho?

  33. Ge0rG

    Zash: yes, but a queue limit would have prevented it

  34. Kev

    It's not clear to me how you solve that problem reasonably.

  35. Ge0rG

    Keep an eye on the largest known MUCs, and make the limit slightly larger than the sum of the top 5 room occupants

  36. Ge0rG

    And by MUCs, I also mean bridged rooms of any sort

  37. Holger

    Get rid of stream management :-)

  38. Zash

    Get rid of the queue

  39. Ge0rG

    Get rid of clients

  40. Kev

    I think this is only related to stream management, no? You end up with a queue somewhere?

  41. Zash

    Yes! Only servers!

  42. jonas’

    Zash, peer to peer?

  43. Zash

    NOOOOOOOO

  44. Ge0rG

    Holger, Zash: we could implement per-JID s2s backpressure

  45. Zash

    Well no, bu tyes

  46. Holger

    Kev: You end up with stored MAM messages.

  47. Ge0rG

    s2s 0198, but scoped to individual JIDs

  48. Ge0rG

    also that old revision that allowed to request throttling from the remote end

  49. Zash

    You could make it so that resumption is not possible if there's more unacked stanzas than a (smaller) queue size

  50. Zash

    At some point it's going to be just as expensive to start over with a fresh session

  51. Ge0rG

    Zash: a client that auto-joins a big MUC on connect will surely cope with such invisible limits

  52. Holger

    Where you obviously might want to implement some sort of disk storage quota, but that's less likely to be too small for clients to cope. Also the burst is often just presence stanzas, which we might be able to reduce/avoid some way.

  53. Zash

    Soooo, presence based MUC is the problem yet again

  54. Holger

    Anyway, until you guys fixed all these things for me, I'll want to have a queue size limit :-)

  55. Zash

    I remember discussing MUC optimizations, like skipping most initial presence for large channels

  56. Ge0rG

    we need incremental presence updates.

  57. Holger

    ejabberd's room config has an "omit that presence crap altogether" knob. I think p1 customers usually press that and then things suddenly work.

  58. eta

    isn't there a XEP for room presence list deltas

  59. eta

    I also don't enjoy getting megabytes of presence upon joining all the MUCs

  60. Zash

    eta: Yeah, XEP-0436 MUC presence versioning

  61. eta

    does anyone plan on implementing it?

  62. Zash

    I suspect someone is. Not me tho, not right now.

  63. Zash

    Having experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making pretty large

  64. Zash

    Having experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making deltas* pretty large

  65. eta

    oh gods

  66. Zash

    And given the rate of presence updates in the kind of MUC where you'd want optimizations... not sure how much deltas will help.\

  67. Holger

    Yeah I was wondering about the effectiveness for large rooms as well.

  68. Zash

    Just recording every presence update and replaying it like MAM sure won't do. Actual diff will be better, but will it be enough?

  69. Zash

    Would be nice to have some kind of numbers

  70. Ge0rG

    So we need to split presence into "room membership updates" and "live user status updates"?

  71. Zash

    MIX?

  72. Zash

    Affiliation updates and quitjoins is easy enough to separate

  73. Ge0rG

    and then we end up with matrix-style rooms, and some clients joining and leaving the membership all the time

  74. Zash

    So we have affiliations, currently present nicknames (ie roles) and presence updates

  75. Zash

    I've been thinking along the lines of that early CSI presence optimizer, where you'd only send presence for "active users" (spoke recently or somesuch). Would be neat to have a summary-ish stanza saying "I just sent you n out of m presences"

  76. Zash

    You could also ignore pure presence updates from unaffiliated users and that kind of thing

  77. Ge0rG

    also you only want to know the total number of users and the first page full of them, the other ones aren't displayed anyway ;)

  78. Zash

    Yeah

  79. flow

    Zash> Soooo, presence based MUC is the problem yet again I think the fundamental design problem is pushing stanzas instead of recipients requesting them. Think for example a participant of a high traffic MUC using a low throughput connection (e.g. GSM). That MUC could easily kill the participants connection

  80. Zash

    You do request them by joining.

  81. flow

    Zash, sure, let me clarify: requesting them on smaller batches (e.g. MAM pagination style)

  82. flow

    Zash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style)

  83. Zash

    You just described how Matrix works btw

  84. flow

    I did not know that, but it appears like one (probably sensible) solution to the flow control / traffic management problem we have

  85. jonas’

    or like MIX ;D

  86. Ge0rG

    let's just do everything in small batches.

  87. flow

    correct me if I am wrong, but MIX's default modus operandi is still to fan-out all messages

  88. jonas’

    I think only if you subscribe to messages

  89. jonas’

    also, I thought we were talking about *presence*, not messages.

  90. flow

    I think the stanza kind does not matter

  91. flow

    if someone sends you stanzas with a higher rate than you can consume some intermedidate queue will fill

  92. jonas’

    yeah, well, that’s true for everything

  93. flow

    hence I wrote "fundamental design problem"

  94. jonas’

    I can see the case for MUC/MIX presence because that’s a massive amplification (you send single presence, you get a gazillion and a continuous stream back)

  95. jonas’

    yeah, no, I don’t believe in polling for messages

  96. Kev

    The main issue is catchup.

  97. jonas’

    if you’re into that kind of stuff, use BOSH

  98. flow

    I did not say anything about polling

  99. Kev

    Whether when you join you receive a flood of everything, or whether you request stuff when you're ready for it, in batches.

  100. Kev

    Using MAM on MIX is meant to give you the latter.

  101. flow

    and yes, the problem is more likely caused by presence stanzas, but could be caused by IQs or messages as well

  102. Kev

    If you have a room that is itself generating 'live' stanzas at such a rate that it fills queues, that is also a problem, but is distinct from the 'joining lots of MUCs disconnects me' problem.

  103. flow

    Kev, using the user's MAM service or the MIX channel's MAM service?

  104. Kev

    Both use the same paging mechanic.

  105. jonas’

    12:41:06 flow1> Zash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style) how is that not polling then?

  106. jonas’

    though I sense that this is a discussion about semantics I don’t want to get into right now.

  107. flow

    right, I wanted to head towards the question on how to be notified that there are new messages that you may want to request

  108. jonas’

    by receiving a <message/> with the message.

  109. flow

    that does not appear to be a solution, as you easily run into the same problem

  110. jonas’

    [citation needed]

  111. flow

    I was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head IDs

  112. Holger

    MAM/Sub!

  113. flow

    I was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head ID(s)

  114. flow

    but then again, it does not appear to be a elegant solution (or potentially is no solution at all)

  115. Zash

    Oh, this is basically the same problem as IP congestion, is it not?

  116. Zash

    And the way to solve that is to throw data away. Enjoy telling your users that.

  117. Zash

    > The main issue is catchup. This. So now you'll have to figure out what data got thrown away and fetch it.

  118. Zash

    (Also how Matrix works.)

  119. eta

    the one thing that may be good to steal from matrix is push rules

  120. eta

    i.e. some server side filtering you can do to figure out what should generate a push notification

  121. Zash

    Can you rephrase that in a way that doesn't make me want to say "but they stole this from us"

  122. eta

    well so CSI filtering is an XMPP technology, right

  123. eta

    but there's no API to extend it

  124. eta

    like you can't say "please send me everything matching the regex /e+ta/"

  125. Zash

    "push rules" meaning what, exactly?

  126. pep.

    Zash: it's just reusing good ideas :p

  127. Zash

    You said "push notifications", so I assumed "*mobile* push notifications"

  128. Ge0rG

    Zash: a filter that the client can define to tell the server what's "important"

  129. Zash

    AMP?

  130. eta

    Zash, so yeah, push rules are used for mobile push notifications in Matrix

  131. Zash

    Push a mod_firewall script? 🙂

  132. Ge0rG

    for push notifications, the logic is in the push server, which is specific to the client implementation

  133. Zash

    eta: So you mean user-configurable rules?

  134. eta

    Zash, yeah

  135. Ge0rG

    not rather client-configurable?

  136. eta

    I mean this is ultimately flawed anyway because e2ee is a thing

  137. Zash

    Everything is moot because E2EE

  138. Ge0rG

    I'm pretty sure there is no place in matrix where you can enter push rule regexes

  139. pulkomandy

    Is the problem really to be solved on the client-server link? What about some kind of flow control on the s2s side instead? (no idea about the s2s things in xmpp, so maybe that's not doable)

  140. eta

    Ge0rG, tada https://matrix.org/docs/spec/client_server/r0.6.1#m-push-rules

  141. Zash

    Ge0rG: Keywords tho, which might be enough

  142. eta

    you can have a "glob-style pattern"

  143. Zash

    Ugh

  144. Ge0rG

    eta: that's not what I mean

  145. Ge0rG

    eta: show me a Riot screenshot where you can define those globs

  146. eta

    Ge0rG, hmm, can't you put them into the custom keywords field

  147. pulkomandy

    If you try to solve it on client side you will invent something like tcp windows. Which is indeed a way to solve ip congestion. And doesn't work here because congestion on the server to client socket doesn't propagate to other links

  148. eta doesn't really care about this argument though and is very willing to just concede to Ge0rG :p

  149. Zash

    What was that thing in XEP-0198 that got removed? Wasn't that rate limiting?

  150. Ge0rG

    Zash: yes

  151. eta

    I think the presence-spam-in-large-MUCs issue probably needs some form of lazy loading, right

  152. eta

    like, send user presence before they talk

  153. eta

    have an API (probably with RSM?) to fetch all user presences

  154. Zash

    eta: Yeah, that's what I was thinking

  155. eta

    the matrix people had pretty much this exact issue and solved it the same way

  156. Zash

    Oh no, then we need to do it differently!!11!!11!!1 eleven

  157. eta

    Zash, it's fine, they use {} brackets and we'll use <> ;P

  158. Zash

    Phew 😃

  159. eta

    the issue with lots of messages in active MUCs is more interesting though

  160. eta

    like for me, Conversations chews battery because I'm in like 6-7 really active IRC channels

  161. eta

    so my phone never sleeps

  162. eta

    I've been thinking I should do some CSI filtering, but then the issue is you fill up the CSI queue

  163. Zash

    A thing I've almost started stealing from Matrix is room priorities.

  164. Zash

    So I have a command where I can mark public channels as low-priority, and then nothing from those gets pushed trough CSI

  165. Ge0rG

    eta: the challenge here indeed is that all messages will bypass CSI, which is not perfect

  166. eta

    Zash, yeah, there's that prosody module for that

  167. Ge0rG

    eta: practically speaking, you might want to have a wordlist that MUC messages must match to be pushed

  168. eta

    I almost feel like the ideal solution is something more like

  169. eta

    I want the server to join the MUC for me

  170. eta

    I don't want my clients to join the MUC (disable autojoin in bookmarks)

  171. eta

    and if I get mentioned or something, I want the server to somehow forward the mentioned message

  172. Ge0rG

    eta: your client still needs to get all the MUC data, eventually

  173. eta

    Ge0rG, sure

  174. eta

    but, like, I'll get the forwarded message with the highlight

  175. eta

    then I can click/tap on the MUC to join it

  176. Ge0rG

    eta: so CSI with what Zash described is actually good

  177. eta

    and then use MAM to lazy-paginate

  178. eta

    Ge0rG, yeah, but it fills up in-memory queues serverside

  179. Ge0rG

    eta: but I think that command is too magic for us mortals

  180. Ge0rG

    eta: yes, but a hundred messages isn't much in the grand scheme of things

  181. eta

    Ge0rG, a hundred is an underestimate ;P

  182. eta

    some of the IRC channels have like 100 messages in 5 minutes or something crazy

  183. Holger

    https://jabber.fu-berlin.de/share/holger/EuIflBOiuR0UyOtA/notifications.jpeg

  184. Holger

    C'mon guys this is trivial to solve.

  185. Ge0rG

    my prosody is currently consuming ~ 500kB per online user

  186. Holger

    https://jabber.fu-berlin.de/share/holger/aIlgwvzEMWv66zF9/notifications.jpeg

  187. Holger

    Oops.

  188. eta

    Zash, also ideally that prosody module would use bookmarks

  189. eta

    instead of an ad-hoc command

  190. Ge0rG

    eta: naah

  191. Zash

    Bookmarks2 with a priority extension would be cool

  192. Ge0rG

    we need a per-JID notification preference, like "never" / "always" / "on mention" / "on string match"

  193. Ge0rG

    which is enforced by the server

  194. eta

    Ge0rG: that's a different thing though

  195. Ge0rG

    eta: is it really?

  196. Ge0rG

    eta: for mobile devices, CSI-passthrough is only relevant for notification causing messages

  197. eta

    Ge0rG: ...actually, yeah, I agree

  198. Ge0rG

    you want to get pushed all the messages that will trigger a notification

  199. Ge0rG

    which ironically means that all self-messages get pushed through so that the mobile client can *clear* notifications

  200. Ge0rG

    which ironically also pushes outgoing Receipts

  201. Ge0rG

    eta: I'm sure I've written a novel or two on standards regarding that

  202. Ge0rG

    or maybe just in the prosody issue tracker

  203. Ge0rG

    eta: also CSI is currently in Last Call, so feel free to add your two cents

  204. Zash

    Ironically?

  205. Ge0rG isn't going to re-post his "What's Wrong with XMPP" slide deck again

  206. Ge0rG

    Also the topic of notification is just a TODO there.

  207. Zash

    Heh

  208. Zash

    > you want to get pushed all the messages that will trigger a notification and that's roughly the same set that you want archived and carbon'd, I think, but not exactly

  209. eta

    Ge0rG: wait that sounds like an interesting slide deck

  210. eta

    Zash: wild idea, just maintain a MAM archive for "notifications"

  211. eta

    I guess a pubsub node would also work

  212. eta

    and you shove all said "interesting" messages in there

  213. Ge0rG

    eta: https://op-co.de/tmp/whats-wrong-with-xmpp-2017.pdf

  214. Zash

    eta: MAM for the entire stream?

  215. Zash

    Wait, what's "notifications" here?

  216. Zash

    Stuff that causes the CSI queue to get flushed? Most of that'll be in MAM already.

  217. eta

    Zash: well mentions really

  218. Ge0rG

    eta: MAM doesn't give you push though

  219. eta

    Ge0rG: okay, after reading those slides I'd say that's a pretty good summary and proposal

  220. Ge0rG

    eta: all it needs is somebody to implement all the moving parts

  221. Zash

    Break it into smaller (no, even smaller!) pieces and file bug reports?

  222. Zash

    /correct feature requests*

  223. Ge0rG

    when I break it into this small pieces, the context gets lost

  224. Ge0rG

    like just now I realized there might be some smarter way to handle "sent" carbons in CSI, than just passing all through

  225. Zash

    One huge "do all these things" isn't great either

  226. Ge0rG

    but maybe a sent carbon of a Receipt isn't too bad after all because it most often comes short after the original message that also pierced CSI?

  227. Ge0rG

    did I mention that I'm collecting large amounts of data on the number and reason of CSI wakeups?

  228. Zash

    Possibly

  229. Ge0rG

    and that the #1 reason used to be disco#info requests to the client?

  230. Zash

    Possibly (re carbon-receipts)

  231. Zash

    Did I mention that I too collected stats on that, until I discovered that storing stats murdered my server?

  232. Ge0rG

    I'm only "storing" them in prosody.log, and that expires after 14d

  233. Ge0rG

    but maybe somebody wants to bring them to some use?

  234. Zash

    disco#info cache helped a *lot* IIRC

  235. Zash

    I also found that a silly amount of wakeups were due to my own messages on another device, after which I wrote a grace period thing for that.

  236. Zash

    IIRC before I got rid of stats collection it was mostly client-initiated wakeups that triggered CSI flushes

  237. Ge0rG

    Zash: "own messages on other device" needs some kind of logic maybe

  238. Ge0rG

    like: remember the last message direction per JID, only wake up on outgoing read-marker / body when direction changes?

  239. Zash

    Ge0rG: Consider me, writing here, right now, on my work station. Groupchat messages sent to my phone.

  240. Ge0rG

    just waking up on outgoing read-marker / body would be a huge improvement already

  241. Ge0rG

    Zash: yes, that groupchat message is supposed to clear an eventual notification for the groupchat

  242. Ge0rG

    that = your

  243. Zash

    After the grace period ends, if there were anything high-priority since the last activity from that other client, then it should push.

  244. Zash

    Not done that yet tho I thkn

  245. Zash

    But as long as I'm active at another device, pushing to the phone is of no use

  246. Zash

    Tricky to handle the case of an incoming message just after typing "brb" and grabbing the phone to leave

  247. Zash

    Especially with a per-stanza yes/no/maybe function, it'll need a "maybe later" response

  248. Ge0rG

    Zash: yeah. Everything is HARD

  249. eta

    also for all slack's complicated diagrams their notifications don't even work properly either

  250. eta

    like it doesn't dismiss them on my phone, etc

  251. flow

    Zash> And the way to solve that is to throw data away. Enjoy telling your users that. I'd say that's where there is TCP on top of IP (where I'd argue, the actual congestion and traffic flow control happens)

  252. Zash

    flow: With TCP, same as XMPP, you just end up filling up buffers and getting OOM'd

  253. flow

    Zash, I don't think those two are realy comperable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)

  254. flow

    Zash, I don't think those two are realy comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)

  255. flow

    Zash, I don't think those two are really comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)

  256. Zash

    (me says nothing about mptcp)

  257. Zash

    So what Ge0rG said about slowing down s2s links?

  258. flow

    I did not read the full backlog, could to summarize what Ge0rG said?

  259. flow

    (otherwise I have to read it first)

  260. Zash

    13:31:21 Ge0rG "Holger, Zash: we could implement per-JID s2s backpressure"

  261. flow

    besides, arent in MPTCP still only two endpoints involved (but using potentially multiple paths)?

  262. flow

    besides, aren't in MPTCP still only two endpoints involved (but using potentially multiple paths)?

  263. flow

    I am not sure if that is technically possible, the "per-JID" part here alone could be tricky

  264. flow

    it appears that implementing backpressure would likely involve signalling back to the sender, but what if the path the sender is also congested?

  265. Zash

    I'm not sure this is even doable without affecting other users of that s2s link

  266. flow

    as of now, the only potential solution I could come up with is keeping the state server side, and have servers notify clients that the state changes, so that clients can sync whenever they want, and especially how fast they want

  267. flow

    but that does not solve the problem for servers with poor connectivity

  268. jonas’

    let’s change xmpp-s2s to websockets / http/3 or whatever which supports multiple streams and will of course solve the scheduling issue of streams competing for resources and not at all draw several CVE numbers in that process :)

  269. Zash

    Not impossible to open more parallell s2s links...

  270. jonas’

    one for each JID? :)

  271. jonas’

    one for each local JID? :)

  272. Zash

    Heh, you could open a secondary one for big bursts of stanzas like MUC joins and MAM ....

  273. Zash

    Like I think there were thoughts in the past about using a secondary client connection for vcards

  274. jonas’

    haha wat

  275. Zash

    Open 2 c2s connections. Use one as normal, presence, chat etc there. except send some requests like for vcards over the other one, since they often contain big binary blobs that then wouldn't block the main connection :)

  276. pulkomandy

    Well… at this point you may start thinking about removing tcp (its flow control doesn't work in this case anyway) and do something xml over udp instead?

  277. Zash

    At some point it stopped being XMPP

  278. moparisthebest

    QUIC solves this