mod_smacks did have such limits for a short time, but they caused exactly this problem and were then removed until someone can come up with a better way to deal with it
Ge0rG
was that the thing that made my server explode due to an unbound smacks queue? ;)
Beherithas left
Zash
I got the impression that yesterdays discussion was about the opposite problem, killing the session once the queue is too large
Zash
So, no, not that issue.
lovetox
it turned out the user which reported that problem had a queue size of 1000
Ge0rG
what lovetox describes sounds like a case of too much burst traffic, not of socket synchronization issues
lovetox
the current ejabberd default is much higher
lovetox
but a few versions ago it was 1000
Ge0rG
if you join a dozen MUCs at the same time, you might well run into a 1000 stanza limit
lovetox
1000 is like nothing
lovetox
you cant even join one irc room like #ubuntu or #python
Ge0rG
lovetox: only join MUCs once at a time ;)
lovetox
you get instantly disconnected
Ge0rG
Matrix HQ
Ge0rG
unless the bridge is down ;)
lovetox
the current ejabberd default is 5000
lovetox
which until now works ok
Ge0rG
I'm sure the Matrix HQ has more users. But maybe it's slow enough in pushing them over the bridge that you can fetch them from your server before you are killed
Kev
Ge0rG: Oh, you're suggesting that it's a kill based on timing out an ack because the server is ignoring that it's reading stuff from the socket that's acking old stanzas, rather than timing out on data not being received?
Kev
That seems plausible.
Ge0rG
Kev: I'm not sure it has to do with the server's reading side of the c2s socket at all
Holger<- still trying to parse that sentence :-)
Zash
Dunno how ejabberd works but that old mod_smacks version just killed the session once it hit n queued stanzas.
Ge0rG
Holger: I'm not sure I underestood it either
adiaholic_has left
adiaholic_has joined
Ge0rG
Kev: I think it's about N stanzas suddenly arriving for a client, with N being larger than the maximum queue size
Holger
Zash: That's how ejabberd works. Yes that's problematic, but doing nothing is even more so, and so far I see no better solution.
Kev
Ah. I understand now, yes.
Ge0rG
Holger: you could add a time-based component to that, i.e. allow short bursts to exceed the limit
test2has left
Ge0rG
give the client a chance to consume the burst and to ack it
test2has joined
Holger
I mean if you do nothing it's absolutely trivial to OOM-kill the entire server.
Ge0rG
Holger: BTDT
Zash
Ge0rG: Wasn't that a feedback loop tho?
Ge0rG
Zash: yes, but a queue limit would have prevented it
lovetoxhas left
Kev
It's not clear to me how you solve that problem reasonably.
Ge0rG
Keep an eye on the largest known MUCs, and make the limit slightly larger than the sum of the top 5 room occupants
Ge0rG
And by MUCs, I also mean bridged rooms of any sort
Holger
Get rid of stream management :-)
Zash
Get rid of the queue
Ge0rG
Get rid of clients
Kev
I think this is only related to stream management, no? You end up with a queue somewhere?
Zash
Yes! Only servers!
jonas’
Zash, peer to peer?
Zash
NOOOOOOOO
Ge0rG
Holger, Zash: we could implement per-JID s2s backpressure
Zash
Well no, bu tyes
Holger
Kev: You end up with stored MAM messages.
Ge0rG
s2s 0198, but scoped to individual JIDs
Ge0rG
also that old revision that allowed to request throttling from the remote end
Zash
You could make it so that resumption is not possible if there's more unacked stanzas than a (smaller) queue size
Zash
At some point it's going to be just as expensive to start over with a fresh session
Ge0rG
Zash: a client that auto-joins a big MUC on connect will surely cope with such invisible limits
Holger
Where you obviously might want to implement some sort of disk storage quota, but that's less likely to be too small for clients to cope. Also the burst is often just presence stanzas, which we might be able to reduce/avoid some way.
Zash
Soooo, presence based MUC is the problem yet again
Holger
Anyway, until you guys fixed all these things for me, I'll want to have a queue size limit :-)
Zash
I remember discussing MUC optimizations, like skipping most initial presence for large channels
Ge0rG
we need incremental presence updates.
Holger
ejabberd's room config has an "omit that presence crap altogether" knob. I think p1 customers usually press that and then things suddenly work.
eta
isn't there a XEP for room presence list deltas
eta
I also don't enjoy getting megabytes of presence upon joining all the MUCs
Zash
eta: Yeah, XEP-0436 MUC presence versioning
Beherithas joined
eta
does anyone plan on implementing it?
pulkomandyhas left
pulkomandyhas joined
Zash
I suspect someone is. Not me tho, not right now.
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
Zash
Having experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making pretty large✎
Zash
Having experimented with presence deduplication, I got the feeling that every single presence stanza is unique, making deltas* pretty large ✏
Beherithas left
eta
oh gods
marc0shas left
marc0shas joined
Zash
And given the rate of presence updates in the kind of MUC where you'd want optimizations... not sure how much deltas will help.\
Holger
Yeah I was wondering about the effectiveness for large rooms as well.
Zash
Just recording every presence update and replaying it like MAM sure won't do. Actual diff will be better, but will it be enough?
Zash
Would be nice to have some kind of numbers
Ge0rG
So we need to split presence into "room membership updates" and "live user status updates"?
Zash
MIX?
dropshas left
Zash
Affiliation updates and quitjoins is easy enough to separate
Ge0rG
and then we end up with matrix-style rooms, and some clients joining and leaving the membership all the time
Zash
So we have affiliations, currently present nicknames (ie roles) and presence updates
Beherithas joined
Zash
I've been thinking along the lines of that early CSI presence optimizer, where you'd only send presence for "active users" (spoke recently or somesuch). Would be neat to have a summary-ish stanza saying "I just sent you n out of m presences"
pulkomandyhas left
pulkomandyhas joined
Zash
You could also ignore pure presence updates from unaffiliated users and that kind of thing
pulkomandyhas left
pulkomandyhas joined
Beherithas left
Ge0rG
also you only want to know the total number of users and the first page full of them, the other ones aren't displayed anyway ;)
Zash
Yeah
test2has left
Beherithas joined
flow
Zash> Soooo, presence based MUC is the problem yet again
I think the fundamental design problem is pushing stanzas instead of recipients requesting them.
Think for example a participant of a high traffic MUC using a low throughput connection (e.g. GSM). That MUC could easily kill the participants connection
paulhas left
serge90has joined
Zash
You do request them by joining.
paulhas joined
flow
Zash, sure, let me clarify: requesting them on smaller batches (e.g. MAM pagination style)✎
flow
Zash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style) ✏
Zash
You just described how Matrix works btw
flow
I did not know that, but it appears like one (probably sensible) solution to the flow control / traffic management problem we have
test2has joined
lovetoxhas joined
jonas’
or like MIX ;D
Ge0rG
let's just do everything in small batches.
flow
correct me if I am wrong, but MIX's default modus operandi is still to fan-out all messages
jonas’
I think only if you subscribe to messages
jonas’
also, I thought we were talking about *presence*, not messages.
flow
I think the stanza kind does not matter
flow
if someone sends you stanzas with a higher rate than you can consume some intermedidate queue will fill
jonas’
yeah, well, that’s true for everything
flow
hence I wrote "fundamental design problem"
jonas’
I can see the case for MUC/MIX presence because that’s a massive amplification (you send single presence, you get a gazillion and a continuous stream back)
jonas’
yeah, no, I don’t believe in polling for messages
Kev
The main issue is catchup.
jonas’
if you’re into that kind of stuff, use BOSH
flow
I did not say anything about polling
Kev
Whether when you join you receive a flood of everything, or whether you request stuff when you're ready for it, in batches.
Kev
Using MAM on MIX is meant to give you the latter.
flow
and yes, the problem is more likely caused by presence stanzas, but could be caused by IQs or messages as well
Kev
If you have a room that is itself generating 'live' stanzas at such a rate that it fills queues, that is also a problem, but is distinct from the 'joining lots of MUCs disconnects me' problem.
flow
Kev, using the user's MAM service or the MIX channel's MAM service?
Kev
Both use the same paging mechanic.
jonas’
12:41:06 flow1> Zash, sure, let me clarify: requesting them in smaller batches (e.g. MAM pagination style)
how is that not polling then?
jonas’
though I sense that this is a discussion about semantics I don’t want to get into right now.
flow
right, I wanted to head towards the question on how to be notified that there are new messages that you may want to request
jonas’
by receiving a <message/> with the message.
flow
that does not appear to be a solution, as you easily run into the same problem
jonas’
[citation needed]
flow
I was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head IDs✎
Holger
MAM/Sub!
flow
I was thinking more along the lines of infrequent/slightly delayed notifications with the current stanza/message head ID(s) ✏
flow
but then again, it does not appear to be a elegant solution (or potentially is no solution at all)
Beherithas left
Beherithas joined
Beherithas left
Zash
Oh, this is basically the same problem as IP congestion, is it not?
Beherithas joined
Zash
And the way to solve that is to throw data away. Enjoy telling your users that.
Zash
> The main issue is catchup.
This. So now you'll have to figure out what data got thrown away and fetch it.
Zash
(Also how Matrix works.)
lovetoxhas left
eta
the one thing that may be good to steal from matrix is push rules
eta
i.e. some server side filtering you can do to figure out what should generate a push notification
Zash
Can you rephrase that in a way that doesn't make me want to say "but they stole this from us"
eta
well so CSI filtering is an XMPP technology, right
eta
but there's no API to extend it
eta
like you can't say "please send me everything matching the regex /e+ta/"
Zash
"push rules" meaning what, exactly?
pep.
Zash: it's just reusing good ideas :p
Zash
You said "push notifications", so I assumed "*mobile* push notifications"
Ge0rG
Zash: a filter that the client can define to tell the server what's "important"
Zash
AMP?
eta
Zash, so yeah, push rules are used for mobile push notifications in Matrix
Zash
Push a mod_firewall script? 🙂
Ge0rG
for push notifications, the logic is in the push server, which is specific to the client implementation
Zash
eta: So you mean user-configurable rules?
eta
Zash, yeah
Ge0rG
not rather client-configurable?
eta
I mean this is ultimately flawed anyway because e2ee is a thing
Zash
Everything is moot because E2EE
Ge0rG
I'm pretty sure there is no place in matrix where you can enter push rule regexes
pulkomandy
Is the problem really to be solved on the client-server link? What about some kind of flow control on the s2s side instead? (no idea about the s2s things in xmpp, so maybe that's not doable)
eta
Ge0rG, tada https://matrix.org/docs/spec/client_server/r0.6.1#m-push-rules
Zash
Ge0rG: Keywords tho, which might be enough
serge90has left
eta
you can have a "glob-style pattern"
Zash
Ugh
Ge0rG
eta: that's not what I mean
Ge0rG
eta: show me a Riot screenshot where you can define those globs
eta
Ge0rG, hmm, can't you put them into the custom keywords field
pulkomandy
If you try to solve it on client side you will invent something like tcp windows. Which is indeed a way to solve ip congestion. And doesn't work here because congestion on the server to client socket doesn't propagate to other links
etadoesn't really care about this argument though and is very willing to just concede to Ge0rG :p
Zash
What was that thing in XEP-0198 that got removed? Wasn't that rate limiting?
Ge0rG
Zash: yes
eta
I think the presence-spam-in-large-MUCs issue probably needs some form of lazy loading, right
eta
like, send user presence before they talk
eta
have an API (probably with RSM?) to fetch all user presences
Zash
eta: Yeah, that's what I was thinking
eta
the matrix people had pretty much this exact issue and solved it the same way
Zash
Oh no, then we need to do it differently!!11!!11!!1 eleven
eta
Zash, it's fine, they use {} brackets and we'll use <> ;P
Zash
Phew 😃
eta
the issue with lots of messages in active MUCs is more interesting though
eta
like for me, Conversations chews battery because I'm in like 6-7 really active IRC channels
eta
so my phone never sleeps
eta
I've been thinking I should do some CSI filtering, but then the issue is you fill up the CSI queue
Zash
A thing I've almost started stealing from Matrix is room priorities.
Zash
So I have a command where I can mark public channels as low-priority, and then nothing from those gets pushed trough CSI
Ge0rG
eta: the challenge here indeed is that all messages will bypass CSI, which is not perfect
eta
Zash, yeah, there's that prosody module for that
Ge0rG
eta: practically speaking, you might want to have a wordlist that MUC messages must match to be pushed
eta
I almost feel like the ideal solution is something more like
eta
I want the server to join the MUC for me
eta
I don't want my clients to join the MUC (disable autojoin in bookmarks)
eta
and if I get mentioned or something, I want the server to somehow forward the mentioned message
Ge0rG
eta: your client still needs to get all the MUC data, eventually
eta
Ge0rG, sure
eta
but, like, I'll get the forwarded message with the highlight
eta
then I can click/tap on the MUC to join it
Ge0rG
eta: so CSI with what Zash described is actually good
eta
and then use MAM to lazy-paginate
eta
Ge0rG, yeah, but it fills up in-memory queues serverside
Ge0rG
eta: but I think that command is too magic for us mortals
goffihas joined
Ge0rG
eta: yes, but a hundred messages isn't much in the grand scheme of things
eta
Ge0rG, a hundred is an underestimate ;P
eta
some of the IRC channels have like 100 messages in 5 minutes or something crazy
Zash, also ideally that prosody module would use bookmarks
eta
instead of an ad-hoc command
Ge0rG
eta: naah
Zash
Bookmarks2 with a priority extension would be cool
Ge0rG
we need a per-JID notification preference, like "never" / "always" / "on mention" / "on string match"
Ge0rG
which is enforced by the server
eta
Ge0rG: that's a different thing though
Ge0rG
eta: is it really?
Ge0rG
eta: for mobile devices, CSI-passthrough is only relevant for notification causing messages
eta
Ge0rG: ...actually, yeah, I agree
Ge0rG
you want to get pushed all the messages that will trigger a notification
serge90has joined
Ge0rG
which ironically means that all self-messages get pushed through so that the mobile client can *clear* notifications
Ge0rG
which ironically also pushes outgoing Receipts
Ge0rG
eta: I'm sure I've written a novel or two on standards regarding that
Ge0rG
or maybe just in the prosody issue tracker
Ge0rG
eta: also CSI is currently in Last Call, so feel free to add your two cents
Zash
Ironically?
Ge0rGisn't going to re-post his "What's Wrong with XMPP" slide deck again
Ge0rG
Also the topic of notification is just a TODO there.
Zash
Heh
lovetoxhas joined
Zash
> you want to get pushed all the messages that will trigger a notification
and that's roughly the same set that you want archived and carbon'd, I think, but not exactly
eta
Ge0rG: wait that sounds like an interesting slide deck
eta
Zash: wild idea, just maintain a MAM archive for "notifications"
eta
I guess a pubsub node would also work
eta
and you shove all said "interesting" messages in there
Did I mention that I too collected stats on that, until I discovered that storing stats murdered my server?
Ge0rG
I'm only "storing" them in prosody.log, and that expires after 14d
Ge0rG
but maybe somebody wants to bring them to some use?
Zash
disco#info cache helped a *lot* IIRC
Zash
I also found that a silly amount of wakeups were due to my own messages on another device, after which I wrote a grace period thing for that.
Zash
IIRC before I got rid of stats collection it was mostly client-initiated wakeups that triggered CSI flushes
Ge0rG
Zash: "own messages on other device" needs some kind of logic maybe
Ge0rG
like: remember the last message direction per JID, only wake up on outgoing read-marker / body when direction changes?
Zash
Ge0rG: Consider me, writing here, right now, on my work station. Groupchat messages sent to my phone.
Ge0rG
just waking up on outgoing read-marker / body would be a huge improvement already
Ge0rG
Zash: yes, that groupchat message is supposed to clear an eventual notification for the groupchat
Ge0rG
that = your
kabakahas joined
Zash
After the grace period ends, if there were anything high-priority since the last activity from that other client, then it should push.
Zash
Not done that yet tho I thkn
Zash
But as long as I'm active at another device, pushing to the phone is of no use
kabakahas left
Zash
Tricky to handle the case of an incoming message just after typing "brb" and grabbing the phone to leave
Zash
Especially with a per-stanza yes/no/maybe function, it'll need a "maybe later" response
ralphmhas left
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
Ge0rG
Zash: yeah. Everything is HARD
paulhas left
waqashas joined
eta
also for all slack's complicated diagrams their notifications don't even work properly either
eta
like it doesn't dismiss them on my phone, etc
ralphmhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
pulkomandyhas left
pulkomandyhas joined
test2has left
test2has joined
adrienhas joined
debaclehas left
adrienhas left
adrienhas joined
adrienhas left
adrienhas joined
adrienhas left
adrienhas joined
lovetoxhas left
paulhas joined
Lancehas joined
Sebhas joined
Sebhas left
lovetoxhas joined
test2has left
serge90has left
marc0shas left
marc0shas joined
debaclehas joined
adiaholic_has left
adiaholic_has joined
sonnyhas left
sonnyhas joined
sonnyhas left
sonnyhas joined
etahas left
etahas joined
waqashas left
Yagizаhas left
flow
Zash> And the way to solve that is to throw data away. Enjoy telling your users that.
I'd say that's where there is TCP on top of IP (where I'd argue, the actual congestion and traffic flow control happens)
Lancehas left
Zash
flow: With TCP, same as XMPP, you just end up filling up buffers and getting OOM'd
flow
Zash, I don't think those two are realy comperable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links)✎
flow
Zash, I don't think those two are realy comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links) ✏
flow
Zash, I don't think those two are really comparable: with tcp you have exactly two endpoints, with xmpp one entity communicates potentially with multiple endpoints (potentially over multiple different s2s links) ✏
Zash
(me says nothing about mptcp)
Zash
So what Ge0rG said about slowing down s2s links?
flow
I did not read the full backlog, could to summarize what Ge0rG said?
flow
(otherwise I have to read it first)
Zash
13:31:21
Ge0rG "Holger, Zash: we could implement per-JID s2s backpressure"
flow
besides, arent in MPTCP still only two endpoints involved (but using potentially multiple paths)?✎
flow
besides, aren't in MPTCP still only two endpoints involved (but using potentially multiple paths)? ✏
flow
I am not sure if that is technically possible, the "per-JID" part here alone could be tricky
flow
it appears that implementing backpressure would likely involve signalling back to the sender, but what if the path the sender is also congested?
Zash
I'm not sure this is even doable without affecting other users of that s2s link
flow
as of now, the only potential solution I could come up with is keeping the state server side, and have servers notify clients that the state changes, so that clients can sync whenever they want, and especially how fast they want
flow
but that does not solve the problem for servers with poor connectivity
jonas’
let’s change xmpp-s2s to websockets / http/3 or whatever which supports multiple streams and will of course solve the scheduling issue of streams competing for resources and not at all draw several CVE numbers in that process :)
Zash
Not impossible to open more parallell s2s links...
Heh, you could open a secondary one for big bursts of stanzas like MUC joins and MAM ....
Zash
Like I think there were thoughts in the past about using a secondary client connection for vcards
jonas’
haha wat
Beherithas left
Beherithas joined
Lancehas joined
Zash
Open 2 c2s connections. Use one as normal, presence, chat etc there. except send some requests like for vcards over the other one, since they often contain big binary blobs that then wouldn't block the main connection :)
Lancehas left
Lancehas joined
goffihas left
Beherithas left
Beherithas joined
sonnyhas left
sonnyhas joined
lovetoxhas left
Beherithas left
Beherithas joined
sonnyhas left
sonnyhas joined
debaclehas left
debaclehas joined
xeckshas left
test2has joined
test2has left
test2has joined
debaclehas left
pulkomandy
Well… at this point you may start thinking about removing tcp (its flow control doesn't work in this case anyway) and do something xml over udp instead?