-
jonas’
lovetox, I don’t see any issue here, so unlikely: https://github.com/horazont/muchopper/issues
-
Жокир
Do any popular servers actually implement XEP-0368? If yes, could anyone give point to any such servers?
-
jonas’
Жокир, https://compliance.conversations.im/ any suggested as "compliant" servers should do, at least for c2s
-
jonas’
I don’t know of any s2s implementation except maybe https://github.com/surevine/Metre , which isn’t quite a server.
-
Guus
Looking at my server log, I'm noticing that I'm getting a lot of connection timeouts on s2s in bursts - presumably x minutes after a user that caused the federation to be set up sent its last presence update.
-
Guus
I wonder if it'd be good to introduce a small factor of randomness to the timeout interval, to avoid staggered behavior.
-
flow
Guus, maybe controversal counter-question: that does sounds like an timeout enforced on the application layer. if so, then why would you have an application layer timeout for s2s connections and not simply let the tcp connection timeout
-
jonas’
flow, save resources.
-
jonas’
the tcp connection will also not time out ever
-
flow
is it worth it?
-
jonas’
because both peers can see each other (in this scenario)
-
jonas’
file descriptors are limited and when you notice you’re running out of them, it’s too late
-
jonas’
being a bit proactive about preserving them is generally a good idea
-
flow
ok so kill idle connections based on the amount of available file descriptions, but not based on time
-
jonas’
you don’t know the amount of available file descriptors
-
flow
(or, to be precise, only as second criteria based on time)
-
jonas’
you know the limit, but you don’t know how many are open in your process
-
jonas’
you can estimate, but you can be wrong in the bad direction.
-
jonas’
(or in both directions, depending on how you estimate)
-
flow
ls /proc/$pid/fd/ | wc -l
-
flow
?
-
jonas’
that’s at least a rather expensive way to do it
-
jonas’
but true, that works, on systems where procfs has that feature
-
flow
why is it expensive?
-
Guus
flow I don't mind much either way - I'm just noticing that I get a lot of disconnects. I'm thinking Prosody does this? Openfire probably does so as well, though.
-
jonas’
flow, that’s many syscalls
-
Guus
(at the very least, it's configurable)
-
jonas’
I can’t see immediately in man 5 procfs whether /proc/$pid/fd is a linux or a posix thing
-
flow
jonas’, I wouldn't be surpised if there is a more efficient way to get that number
-
jonas’
I would
-
Guus
I don't mind much closing idle connections (although it does feel like premature optimization a bit.)
-
flow
especially on linux
-
jonas’
I think I looked into that already and found that it’s not possible
-
jonas’
there’s surely a reason why sudo does a for i in 0..MAXFD do close($i); done
-
Guus
as Openfire is a multi-platform solution, depending on any platform specific thingy is going to be a pain.
-
Guus
unless Java exposes things, which I doubt.
-
flow
Guus, UnixOperationSystemMXBean.getOpenFileDescriptorCount()
-
flow
not sure if something like that also exist for other OS'es
-
Guus
*Unix*OperationSystemMXBean is likely going to fail on Windows? 🙂
-
Guus
but also: not worth the complexity, maybe?
-
flow
so you may have to implement a fallback strategy for sure (like disconnection based on a timeout)
-
jonas’
I don’t see any problem with a timeout here, to be honest
-
flow
Guus, potentially, depends on your goals I'd say
-
jonas’
everything else seems slightly overengineered
-
jonas’
file descriptors may also just be one reason why you want to keep the number of open connections low
-
jonas’
other reasons may include running behind a stateful firewall and wanting to conserve resources there
-
Guus
I was just suggesting to add a small random factor in the timeout delay, nothing more 😉
-
flow
Guus, which is always a good idea
-
Guus
also, given how I see batches of s2s tear down only to be brought up again, I'm suspecting that the default timeout of (Prosody?) might be on the low end.
-
jonas’
prosody doesn’t have a default timeout
-
Guus
oh, that's interesting
-
Guus
note that I didn't actually check what server software is used on those. I just assumed.
-
Guus
having had a closer look: might be ejabberd 🙂
-
jonas’
https://sotecware.net/files/noindex/connections.png
-
jonas’
:-)
-
flow
looks like a 30 minute timeout
-
flow
combined with an hourly cron job maybe?
-
jonas’
I think there are two timeouts, one ~15min (the linear curve down) and one ~30min (which also looks randomized, because of the slight exp-y behaviour at the end)
-
jonas’
and yes, this is the connection stats of search.jabber.network, and the spikes you see is the hourly scan :)
-
Guus
(maybe randomize your scan!)
-
flow
now we only need to identify the implementations with the 15m and 30m (randomized) timeout
-
jonas’
Guus, it’s already shuffled :)
-
flow
and what is keeping the baseline of 1.5k connections
-
Guus
moar shuffling!
-
jonas’
flow, compare the ratios with https://search.jabber.network/stats#software :)
-
jonas’
assuming that many "unknowns" are in fact prosody MUCs, because prosody doesn’t report version on MUC by default IIRC
-
flow
ahh, so it is probably prosody which keeps the connections
-
jonas’
flow, very likely
-
flow
but the amount of 15m and 30m timeout connections appears to be nearly equal
-
jonas’
I experimented with loading mod_s2s_idle_timeout or whatsitcalled on s.j.n, but then I disabled it to reduce the codebase to the minimum for some unrelated testing
-
jonas’
flow, I’ll have to dig deeper into it, it’s also possible that the different behaviours there are an artifact of how the scanner works✎ -
jonas’
flow, I’ll have to dig deeper into it, it’s also possible that the two different falloff behaviours there are an artifact of how the scanner works ✏
-
flow
i see
-
jonas’
since there are two scanning components, and one finishes much quicker than the other; it’s possible that the quicker one is causing the additional tip of the initial spike, while the slower one is what causes the slow fall off at the end
-
jonas’
since the quicker one also tends to touch more domains
-
jonas’
oh yeah, that’s very plausible
-
jonas’
that may also explain the exp falloff due to shuffling
-
jonas’
if there’s really just a 15m or something timeout involved
-
flow
jonas’, are you aware that 'German' appears twice in the room languages table?
-
jonas’
yes
-
jonas’
de-de vs. de
-
jonas’
I need to normalize that
-
jonas’
https://sotecware.net/files/noindex/connections-1h.png
-
jonas’
https://sotecware.net/files/noindex/ingestion-1h.png
-
jonas’
that seems to fit very well
-
jonas’
(the "filled" part in the second graph is the fast component, the "line" part in the second graph is the slow component)
-
jonas’
the fast component ends at 07:24, which is exactly when the initial spike drops in the first graph