-
madmalkav
hey guys, what and how do you check the health of your xmpp servers? you check the processes, the listeners, the availability of DB, ...
-
madmalkav
anything else
-
madmalkav
?
-
pep.
I guess you can have a client that connects every so often and tries to do some actions
-
edhelas
madmalkav I check the complains on Twitter
-
edhelas
https://status.conversations.im/ is actually doing that pep. no ?
-
pep.
No
-
pep.
That checks disco I think
-
pep.
I mean it depends how thorough you want to be
-
madmalkav
I don't want my company to call me at 4am because some operators are not receiving notifications on their cells
-
pep.
Well, if it's not the company it would be some kind of automated notification, you can't really escape that
-
madmalkav
and if that happens, I want my scripts to start the incident measures before people start calling in pacnic
-
madmalkav
🙂
-
madmalkav
yeah, but if I automate it OK I can i.e. generate a warning to the 24/7 support to make some steps, and, if the steps fails, call me
-
madmalkav
ie., what is failing is DB, call DB guys, not me 😃
-
pep.
So the important bit is not the 4am, it's that you know before your users :)
-
madmalkav
being serious now, the important thing is detecting as soon as possible and as best as possible to minimize the downtime to a minimum
-
pep.
So yeah depending on how thorough you want to be, you can have checks on the server, and also emulate client behaviour
-
madmalkav
I can monitor with shell scripts, or python, but I will need help for that, and I can generate warnings to the 24/7 team so they take measures , or directly call me or other teams
-
pep.
If you do something like that for the client I'm also interested BTW :-°
-
Holger
I have Nagios/Icinga anyway and use a simple check plugin to talk to an echo bot.
-
madmalkav
If I do something like that I won't be able to share it , probably . I will have to redo it in a different way so company doesn't say I'm stealing their IP
-
Holger
(Plus separate database/HTTP checks, but that's about it.)
-
madmalkav
fuck big companies
-
Holger
I would suggest a proper monitoring tool rather than inventing wheels.
-
Holger
reinventing even
-
madmalkav
I thought you knew me well enough to know I don't have a word on deceiding what and how it is done
-
madmalkav
😃
-
Holger
Using existing software is bad?
-
Holger
Seems you weren't forced to implement an XMPP server from scratch at least :-)
-
madmalkav
here there is a lot of internal developed software that is totally obsolete but they bosses are being slow on moving on. I've hearing on moving surveillances to Zabbix for at least 2 years
-
madmalkav
My boss is a reasonable guy with lots of technical knowledge and he wants to run away from reinventing the wheel as soon as possible, but he is an exception here
-
madmalkav
So, zabbix / nagios / ... already have XMPP surveillance modules?
-
Holger
Dunnu about Zabbix, that has stuff built-in, maybe. Nagios/Icinga and others have nothing built-in but call simple stand-alone scripts/binaries to perform the actual checks.
-
Holger
You'll find tons of such checks on the web, I wrote the XMPP check I'm using myself though.
-
madmalkav
yeah, I'm probably overthinking it
-
pep.
Also forgot to mention metrics, but he's gone now
-
pep.
(Prometheus and the like, some services export this kind of stuff directly)