-
nuegia.net
Does anybody have a redundant Prosody server?
-
nuegia.net
high availability?
-
jonas’
define "high availability"
-
nuegia.net
service remains online if one of the servers is taken offline
-
nuegia.net
or suffers a failure
-
jonas’
"remains online" certainly not, because prosody does not support hot-standby✎ -
jonas’
"remains online" certainly not, because prosody does not support active/active configurations ✏
-
jonas’
hot-standby may work actually if you use a replicated storage backend, but you still need custom logic to ensure the other node is truly down.
-
nuegia.net
has anyone already implemented that?
-
jonas’
no, because it is a hard problem.✎ -
jonas’
not that I know, because it is a hard problem. ✏
-
jonas’
determining that another node is truly down is a bit tricky.
-
jonas’
you need some kind of quorum system
-
jonas’
with a database backend which uses that, and by configuring prosody to only use the local backend, and making prosody shut down when the backend claims that it lost quorum, it could be done.
-
nuegia.net
lets say that we can determine a node health
-
nuegia.net
there's a plugin for prosody's internal webserver the exposes timers, and a simple script could be made that checks that and prosody's health
-
nuegia.net
possibly even latency
-
nuegia.net
and if your going for a ACTIVE/BACKUP model, a circuit breaker style trip that switches to the backup if the primary server stops working or latency gets too bad.
-
nuegia.net
what else needs to be done?
-
nuegia.net
postgres already supports that model
-
jonas’
what you described isn't sufficient
-
jonas’
the latency may only look bad from your other node, e.g. because of a temporary internet routing issue
-
jonas’
you need to ensure that before starting the backup node, the active node is killed.
-
nuegia.net
the prosody daemon is killed or just not connectable?
-
jonas’
killed.
-
nuegia.net
why?
-
nuegia.net
if no clients or servers are able to connect to it, what's the harm?
-
jonas’
cronjobs inside prosody
-
nuegia.net
oh
-
nuegia.net
what else?
-
jonas’
I don't understand
-
nuegia.net
what do those cron jobs do? also you mentioned a replicated storage backed.
-
jonas’
for instance expiry of uploaded files/storage, but in general arbitrary code and you cannot rely on them (not) doing a specific thing
-
nuegia.net
what storage needs to be replicated? something in the filesystem or can everything be done in the database?
-
jonas’
the only safe way to do this is to ensure prosody is *stopped* on all nodes except one.
-
jonas’
I don't know of a way to put uploaded files into the database, so you'd likely need that (or an external service for that feature) + database for everything else.
-
nuegia.net
I already have prosody's http uploads managed by an external service as part of my webserver. a crontjob on the external webserver manages file expiration based on atimes
-
nuegia.net
is there anything else?
-
nuegia.net
all prosody's job is for files is to generate tokens for the upload server
-
jonas’
the safe way of doing this, as far as I know (the prosody people would know better): Prerequisites: - ensure prosody uses replicated storage for everything, e.g. a database - have reliable measurement of availibility of all nodes - have a way to "fence" (turn off) a node remotely and be sure it's actually off, even when it is already broken/unreachable Failover: 1. detect that currently active node is down 2. make sure it stays down (i.e. turn it off) 3. ensure data is replicated correctly 4. start backup node
-
jonas’
anything other than exactly this failover procedure is gambling.
-
nuegia.net
» 3. ensure data is replicated correctly is it talking about the database backend?
-
jonas’
yes.
-
nuegia.net
also something that's not covered; restoration of the primary server.
-
jonas’
should not need extra action with proper databases, but you never know.
-
jonas’
well restoration of the primary server is just like another failover.
-
nuegia.net
doesn't seem impossible
-
nuegia.net
» - have a way to "fence" (turn off) a node remotely and be sure it's actually off, even when it is already broken/unreachable would iocage stop prosody1 suffice?
-
jonas’
I do not know what those words mean.
-
nuegia.net
» 3. ensure data is replicated correctly how is this done?
-
nuegia.net
» <jonas’> I do not know what those words mean. turning off the BSD jail that prosody belongs too
-
jonas’
seems like it would be sufficient.
-
jonas’
regarding "3. ensure data is replicated correctly" / "how is this done?" -> with a proper replicated database, that's going to be a given and doesn't need separate checking.
-
jonas’
if you use some homebrew hot-standby filesystem sync trickery, that's a different story.
-
nuegia.net
so you just mean configure a postgres cluster not run a application specific database table consistency check
-
jonas’
yes
-
nuegia.net
is there any benefit or cons to configuring prosody to use databases instead of files that it uses by default?
-
jonas’
yes.
-
jonas’
(but don't ask me for specifics)
-
Polarian
> is there any benefit or cons to configuring prosody to use databases instead of files that it uses by default? Normally speed
-
Polarian
and maintainability
-
Polarian
and scriptability
-
jonas’
(the discussion moved on to prosody@conference.prosody.im)
-
Polarian
oh... ok nevermind then :)
-
jonas’
thanks :)
-
sch
Greetings to one and all
-
sch
Would it be correct to assume that an agent/transport (server component) JID can not return ping?
-
sch
I ask because a function that pings to own JID appears to indicate that the JID does not return ping, when JID is component.
-
Guus
every XMPP entity MUST respond to an IQ request - even if it doesn't understand it.
-
jonas’
correct.
-
MattJ
(re-asked in jdev, a more appropriate venue I think)
-
Guus
So, even if the component does not understand the 'ping' request, it should still return an error.
-
jonas’
correct, MattJ.
-
sch
Pardon for cross-posting
-
sch
Guus, I run the component and it appears that I fail to receive ping from the component itself to itself.
-
sch
Both, client and component have the same XEPs loaded.