Zashmod_smacks did have such limits for a short time, but they caused exactly this problem and were then removed until someone can come up with a better way to deal with it
Ge0rGwas that the thing that made my server explode due to an unbound smacks queue? ;)
Beherithas left
ZashI got the impression that yesterdays discussion was about the opposite problem, killing the session once the queue is too large
ZashSo, no, not that issue.
lovetoxit turned out the user which reported that problem had a queue size of 1000
Ge0rGwhat lovetox describes sounds like a case of too much burst traffic, not of socket synchronization issues
lovetoxthe current ejabberd default is much higher
lovetoxbut a few versions ago it was 1000
Ge0rGif you join a dozen MUCs at the same time, you might well run into a 1000 stanza limit
lovetox1000 is like nothing
lovetoxyou cant even join one irc room like #ubuntu or #python
Ge0rGlovetox: only join MUCs once at a time ;)
lovetoxyou get instantly disconnected
Ge0rGMatrix HQ
Ge0rGunless the bridge is down ;)
lovetoxthe current ejabberd default is 5000
lovetoxwhich until now works ok
Ge0rGI'm sure the Matrix HQ has more users. But maybe it's slow enough in pushing them over the bridge that you can fetch them from your server before you are killed
KevGe0rG: Oh, you're suggesting that it's a kill based on timing out an ack because the server is ignoring that it's reading stuff from the socket that's acking old stanzas, rather than timing out on data not being received?
KevThat seems plausible.
Ge0rGKev: I'm not sure it has to do with the server's reading side of the c2s socket at all
Holger<- still trying to parse that sentence :-)
ZashDunno how ejabberd works but that old mod_smacks version just killed the session once it hit n queued stanzas.
Ge0rGHolger: I'm not sure I underestood it either
adiaholic_has left
adiaholic_has joined
Ge0rGKev: I think it's about N stanzas suddenly arriving for a client, with N being larger than the maximum queue size
HolgerZash: That's how ejabberd works. Yes that's problematic, but doing nothing is even more so, and so far I see no better solution.
KevAh. I understand now, yes.
Ge0rGHolger: you could add a time-based component to that, i.e. allow short bursts to exceed the limit
test2has left
Ge0rGgive the client a chance to consume the burst and to ack it
test2has joined
HolgerI mean if you do nothing it's absolutely trivial to OOM-kill the entire server.
Ge0rGHolger: BTDT
ZashGe0rG: Wasn't that a feedback loop tho?
Ge0rGZash: yes, but a queue limit would have prevented it
lovetoxhas left
KevIt's not clear to me how you solve that problem reasonably.
Ge0rGKeep an eye on the largest known MUCs, and make the limit slightly larger than the sum of the top 5 room occupants
Ge0rGAnd by MUCs, I also mean bridged rooms of any sort
HolgerGet rid of stream management :-)
ZashGet rid of the queue
Ge0rGGet rid of clients
KevI think this is only related to stream management, no? You end up with a queue somewhere?
ZashYes! Only servers!
jonas’Zash, peer to peer?
ZashNOOOOOOOO
Ge0rGHolger, Zash: we could implement per-JID s2s backpressure
ZashWell no, bu tyes
HolgerKev: You end up with stored MAM messages.
Ge0rGs2s 0198, but scoped to individual JIDs
Ge0rGalso that old revision that allowed to request throttling from the remote end
ZashYou could make it so that resumption is not possible if there's more unacked stanzas than a (smaller) queue size
ZashAt some point it's going to be just as expensive to start over with a fresh session
Ge0rGZash: a client that auto-joins a big MUC on connect will surely cope with such invisible limits
HolgerWhere you obviously might want to implement some sort of disk storage quota, but that's less likely to be too small for clients to cope. Also the burst is often just presence stanzas, which we might be able to reduce/avoid some way.
ZashSoooo, presence based MUC is the problem yet again
HolgerAnyway, until you guys fixed all these things for me, I'll want to have a queue size limit :-)
ZashI remember discussing MUC optimizations, like skipping most initial presence for large channels
Ge0rGwe need incremental presence updates.
Holgerejabberd's room config has an "omit that presence crap altogether" knob. I think p1 customers usually press that and then things suddenly work.
etaisn't there a XEP for room presence list deltas
etaI also don't enjoy getting megabytes of presence upon joining all the MUCs
Zasheta: Yeah, XEP-0436 MUC presence versioning
Beherithas joined
etadoes anyone plan on implementing it?
pulkomandyhas left
pulkomandyhas joined
ZashI suspect someone is. Not me tho, not right now.
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
ZashHaving experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making pretty large
ZashHaving experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making deltas* pretty large
Beherithas left
etaoh gods
marc0shas left
marc0shas joined
ZashAnd given the rate of presence updates in the kind of MUC where you'd want optimizations... not sure how much deltas will help.\
HolgerYeah I was wondering about the effectiveness for large rooms as well.
ZashJust recording every presence update and replaying it like MAM sure won't do. Actual diff will be better, but will it be enough?
ZashWould be nice to have some kind of numbers
Ge0rGSo we need to split presence into "room membership updates" and "live user status updates"?
ZashMIX?
dropshas left
ZashAffiliation updates and quitjoins is easy enough to separate
Ge0rGand then we end up with matrix-style rooms, and some clients joining and leaving the membership all the time
ZashSo we have affiliations, currently present nicknames (ie roles) and presence updates
Beherithas joined
ZashI've been thinking along the lines of that early CSI presence optimizer, where you'd only send presence for "active users" (spoke recently or somesuch). Would be neat to have a summary-ish stanza saying "I just sent you n out of m presences"
pulkomandyhas left
pulkomandyhas joined
ZashYou could also ignore pure presence updates from unaffiliated users and that kind of thing
pulkomandyhas left
pulkomandyhas joined
Beherithas left
Ge0rGalso you only want to know the total number of users and the first page full of them, the other ones aren't displayed anyway ;)
ZashYeah
test2has left
Beherithas joined
flowZash> Soooo, presence based MUC is the problem yet again
I think the fundamental design problem is pushing stanzas instead of recipients requesting them.
Think for example a participant of a high traffic MUC using a low throughput connection (e.g. GSM). That MUC could easily kill the participants connection
paulhas left
serge90has joined
ZashYou do request them by joining.
paulhas joined
flowZash, sure, let me clarify: requesting them on smaller batches (e.g. MAM pagination style)
flowZash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style)
ZashYou just described how Matrix works btw
flowI did not know that, but it appears like one (probably sensible) solution to the flow control / traffic management problem we have
test2has joined
lovetoxhas joined
jonas’or like MIX ;D
Ge0rGlet's just do everything in small batches.
flowcorrect me if I am wrong, but MIX's default modus operandi is still to fan-out all messages
jonas’I think only if you subscribe to messages
jonas’also, I thought we were talking about *presence*, not messages.
flowI think the stanza kind does not matter
flowif someone sends you stanzas with a higher rate than you can consume some intermedidate queue will fill
jonas’yeah, well, that’s true for everything
flowhence I wrote "fundamental design problem"
jonas’I can see the case for MUC/MIX presence because that’s a massive amplification (you send single presence, you get a gazillion and a continuous stream back)
jonas’yeah, no, I don’t believe in polling for messages
KevThe main issue is catchup.
jonas’if you’re into that kind of stuff, use BOSH
flowI did not say anything about polling
KevWhether when you join you receive a flood of everything, or whether you request stuff when you're ready for it, in batches.
KevUsing MAM on MIX is meant to give you the latter.
flowand yes, the problem is more likely caused by presence stanzas, but could be caused by IQs or messages as well
KevIf you have a room that is itself generating 'live' stanzas at such a rate that it fills queues, that is also a problem, but is distinct from the 'joining lots of MUCs disconnects me' problem.
flowKev, using the user's MAM service or the MIX channel's MAM service?
KevBoth use the same paging mechanic.
jonas’12:41:06 flow1> Zash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style)
how is that not polling then?
jonas’though I sense that this is a discussion about semantics I don’t want to get into right now.
flowright, I wanted to head towards the question on how to be notified that there are new messages that you may want to request
jonas’by receiving a <message/> with the message.
flowthat does not appear to be a solution, as you easily run into the same problem
jonas’[citation needed]
flowI was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head IDs
HolgerMAM/Sub!
flowI was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head ID(s)
flowbut then again, it does not appear to be a elegant solution (or potentially is no solution at all)
Beherithas left
Beherithas joined
Beherithas left
ZashOh, this is basically the same problem as IP congestion, is it not?
Beherithas joined
ZashAnd the way to solve that is to throw data away. Enjoy telling your users that.
Zash> The main issue is catchup.
This. So now you'll have to figure out what data got thrown away and fetch it.
Zash(Also how Matrix works.)
lovetoxhas left
etathe one thing that may be good to steal from matrix is push rules
etai.e. some server side filtering you can do to figure out what should generate a push notification
ZashCan you rephrase that in a way that doesn't make me want to say "but they stole this from us"
etawell so CSI filtering is an XMPP technology, right
etabut there's no API to extend it
etalike you can't say "please send me everything matching the regex /e+ta/"
Zash"push rules" meaning what, exactly?
pep.Zash: it's just reusing good ideas :p
ZashYou said "push notifications", so I assumed "*mobile* push notifications"
Ge0rGZash: a filter that the client can define to tell the server what's "important"
ZashAMP?
etaZash, so yeah, push rules are used for mobile push notifications in Matrix
ZashPush a mod_firewall script? 🙂
Ge0rGfor push notifications, the logic is in the push server, which is specific to the client implementation
Zasheta: So you mean user-configurable rules?
etaZash, yeah
Ge0rGnot rather client-configurable?
etaI mean this is ultimately flawed anyway because e2ee is a thing
ZashEverything is moot because E2EE
Ge0rGI'm pretty sure there is no place in matrix where you can enter push rule regexes
pulkomandyIs the problem really to be solved on the client-server link? What about some kind of flow control on the s2s side instead? (no idea about the s2s things in xmpp, so maybe that's not doable)
etaGe0rG, tada https://matrix.org/docs/spec/client_server/r0.6.1#m-push-rules
ZashGe0rG: Keywords tho, which might be enough
serge90has left
etayou can have a "glob-style pattern"
ZashUgh
Ge0rGeta: that's not what I mean
Ge0rGeta: show me a Riot screenshot where you can define those globs
etaGe0rG, hmm, can't you put them into the custom keywords field
pulkomandyIf you try to solve it on client side you will invent something like tcp windows. Which is indeed a way to solve ip congestion. And doesn't work here because congestion on the server to client socket doesn't propagate to other links
etadoesn't really care about this argument though and is very willing to just concede to Ge0rG :p
ZashWhat was that thing in XEP-0198 that got removed? Wasn't that rate limiting?
Ge0rGZash: yes
etaI think the presence-spam-in-large-MUCs issue probably needs some form of lazy loading, right
etalike, send user presence before they talk
etahave an API (probably with RSM?) to fetch all user presences
Zasheta: Yeah, that's what I was thinking
etathe matrix people had pretty much this exact issue and solved it the same way
ZashOh no, then we need to do it differently!!11!!11!!1 eleven
etaZash, it's fine, they use {} brackets and we'll use <> ;P
ZashPhew 😃
etathe issue with lots of messages in active MUCs is more interesting though
etalike for me, Conversations chews battery because I'm in like 6-7 really active IRC channels
etaso my phone never sleeps
etaI've been thinking I should do some CSI filtering, but then the issue is you fill up the CSI queue
ZashA thing I've almost started stealing from Matrix is room priorities.
ZashSo I have a command where I can mark public channels as low-priority, and then nothing from those gets pushed trough CSI
Ge0rGeta: the challenge here indeed is that all messages will bypass CSI, which is not perfect
etaZash, yeah, there's that prosody module for that
Ge0rGeta: practically speaking, you might want to have a wordlist that MUC messages must match to be pushed
etaI almost feel like the ideal solution is something more like
etaI want the server to join the MUC for me
etaI don't want my clients to join the MUC (disable autojoin in bookmarks)
etaand if I get mentioned or something, I want the server to somehow forward the mentioned message
Ge0rGeta: your client still needs to get all the MUC data, eventually
etaGe0rG, sure
etabut, like, I'll get the forwarded message with the highlight
etathen I can click/tap on the MUC to join it
Ge0rGeta: so CSI with what Zash described is actually good
etaand then use MAM to lazy-paginate
etaGe0rG, yeah, but it fills up in-memory queues serverside
Ge0rGeta: but I think that command is too magic for us mortals
goffihas joined
Ge0rGeta: yes, but a hundred messages isn't much in the grand scheme of things
etaGe0rG, a hundred is an underestimate ;P
etasome of the IRC channels have like 100 messages in 5 minutes or something crazy
etaZash, also ideally that prosody module would use bookmarks
etainstead of an ad-hoc command
Ge0rGeta: naah
ZashBookmarks2 with a priority extension would be cool
Ge0rGwe need a per-JID notification preference, like "never" / "always" / "on mention" / "on string match"
Ge0rGwhich is enforced by the server
etaGe0rG: that's a different thing though
Ge0rGeta: is it really?
Ge0rGeta: for mobile devices, CSI-passthrough is only relevant for notification causing messages
etaGe0rG: ...actually, yeah, I agree
Ge0rGyou want to get pushed all the messages that will trigger a notification
serge90has joined
Ge0rGwhich ironically means that all self-messages get pushed through so that the mobile client can *clear* notifications
Ge0rGwhich ironically also pushes outgoing Receipts
Ge0rGeta: I'm sure I've written a novel or two on standards regarding that
Ge0rGor maybe just in the prosody issue tracker
Ge0rGeta: also CSI is currently in Last Call, so feel free to add your two cents
ZashIronically?
Ge0rGisn't going to re-post his "What's Wrong with XMPP" slide deck again
Ge0rGAlso the topic of notification is just a TODO there.
ZashHeh
lovetoxhas joined
Zash> you want to get pushed all the messages that will trigger a notification
and that's roughly the same set that you want archived and carbon'd, I think, but not exactly
etaGe0rG: wait that sounds like an interesting slide deck
etaZash: wild idea, just maintain a MAM archive for "notifications"
etaI guess a pubsub node would also work
etaand you shove all said "interesting" messages in there
ZashStuff that causes the CSI queue to get flushed? Most of that'll be in MAM already.
etaZash: well mentions really
Ge0rGeta: MAM doesn't give you push though
etaGe0rG: okay, after reading those slides I'd say that's a pretty good summary and proposal
adiaholic_has left
adiaholic_has joined
SouLhas left
SouLhas joined
kikuchiyohas left
Ge0rGeta: all it needs is somebody to implement all the moving parts
esilhas left
SouLhas left
SouLhas joined
ZashBreak it into smaller (no, even smaller!) pieces and file bug reports?
Zash/correct feature requests*
Wojtekhas joined
Ge0rGwhen I break it into this small pieces, the context gets lost
Ge0rGlike just now I realized there might be some smarter way to handle "sent" carbons in CSI, than just passing all through
ZashOne huge "do all these things" isn't great either
Ge0rGbut maybe a sent carbon of a Receipt isn't too bad after all because it most often comes short after the original message that also pierced CSI?
Ge0rGdid I mention that I'm collecting large amounts of data on the number and reason of CSI wakeups?
ZashPossibly
Ge0rGand that the #1 reason used to be disco#info requests to the client?
ZashPossibly (re carbon-receipts)
ZashDid I mention that I too collected stats on that, until I discovered that storing stats murdered my server?
Ge0rGI'm only "storing" them in prosody.log, and that expires after 14d
Ge0rGbut maybe somebody wants to bring them to some use?
Zashdisco#info cache helped a *lot* IIRC
ZashI also found that a silly amount of wakeups were due to my own messages on another device, after which I wrote a grace period thing for that.
ZashIIRC before I got rid of stats collection it was mostly client-initiated wakeups that triggered CSI flushes
Ge0rGZash: "own messages on other device" needs some kind of logic maybe
Ge0rGlike: remember the last message direction per JID, only wake up on outgoing read-marker / body when direction changes?
ZashGe0rG: Consider me, writing here, right now, on my work station. Groupchat messages sent to my phone.
Ge0rGjust waking up on outgoing read-marker / body would be a huge improvement already
Ge0rGZash: yes, that groupchat message is supposed to clear an eventual notification for the groupchat
Ge0rGthat = your
kabakahas joined
ZashAfter the grace period ends, if there were anything high-priority since the last activity from that other client, then it should push.
ZashNot done that yet tho I thkn
ZashBut as long as I'm active at another device, pushing to the phone is of no use
kabakahas left
ZashTricky to handle the case of an incoming message just after typing "brb" and grabbing the phone to leave
ZashEspecially with a per-stanza yes/no/maybe function, it'll need a "maybe later" response
ralphmhas left
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
Ge0rGZash: yeah. Everything is HARD
paulhas left
waqashas joined
etaalso for all slack's complicated diagrams their notifications don't even work properly either
etalike it doesn't dismiss them on my phone, etc
ralphmhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
test2has left
test2has joined
adrienhas joined
debaclehas left
adrienhas left
adrienhas joined
adrienhas left
adrienhas joined
adrienhas left
adrienhas joined
lovetoxhas left
paulhas joined
Lancehas joined
Sebhas joined
Sebhas left
lovetoxhas joined
test2has left
serge90has left
marc0shas left
marc0shas joined
debaclehas joined
adiaholic_has left
adiaholic_has joined
sonnyhas left
sonnyhas joined
sonnyhas left
sonnyhas joined
etahas left
etahas joined
waqashas left
Yagizаhas left
flow Zash> And the way to solve that is to throw data away. Enjoy telling your users that.
I'd say that's where there is TCP on top of IP (where I'd argue, the actual congestion and traffic flow control happens)
Lancehas left
Zashflow: With TCP, same as XMPP, you just end up filling up buffers and getting OOM'd
flowZash, I don't think those two are realy comperable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)
flowZash, I don't think those two are realy comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)
flowZash, I don't think those two are really comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)
Zash(me says nothing about mptcp)
ZashSo what Ge0rG said about slowing down s2s links?
flowI did not read the full backlog, could to summarize what Ge0rG said?
flow(otherwise I have to read it first)
Zash
13:31:21
Ge0rG "Holger, Zash: we could implement per-JID s2s backpressure"
flowbesides, arent in MPTCP still only two endpoints involved (but using potentially multiple paths)?
flowbesides, aren't in MPTCP still only two endpoints involved (but using potentially multiple paths)?
flowI am not sure if that is technically possible, the "per-JID" part here alone could be tricky
flowit appears that implementing backpressure would likely involve signalling back to the sender, but what if the path the sender is also congested?
ZashI'm not sure this is even doable without affecting other users of that s2s link
flowas of now, the only potential solution I could come up with is keeping the state server side, and have servers notify clients that the state changes, so that clients can sync whenever they want, and especially how fast they want
flowbut that does not solve the problem for servers with poor connectivity
jonas’let’s change xmpp-s2s to websockets / http/3 or whatever which supports multiple streams and will of course solve the scheduling issue of streams competing for resources and not at all draw several CVE numbers in that process :)
ZashNot impossible to open more parallell s2s links...
jonas’one for each JID? :)
jonas’one for each local JID? :)
ZashHeh, you could open a secondary one for big bursts of stanzas like MUC joins and MAM ....
ZashLike I think there were thoughts in the past about using a secondary client connection for vcards
jonas’haha wat
Beherithas left
Beherithas joined
Lancehas joined
ZashOpen 2 c2s connections. Use one as normal, presence, chat etc there. except send some requests like for vcards over the other one, since they often contain big binary blobs that then wouldn't block the main connection :)
Lancehas left
Lancehas joined
goffihas left
Beherithas left
Beherithas joined
sonnyhas left
sonnyhas joined
lovetoxhas left
Beherithas left
Beherithas joined
sonnyhas left
sonnyhas joined
debaclehas left
debaclehas joined
xeckshas left
test2has joined
test2has left
test2has joined
debaclehas left
pulkomandyWell… at this point you may start thinking about removing tcp (its flow control doesn't work in this case anyway) and do something xml over udp instead?