XMPP Service Operators - 2021-03-30


  1. Martin

    jonas’: I got o.j.n reports at 09:07 and 09:12 but I don't see anything in my logs. If you are free and it doesn't make too much effort might you have a look what caused them. I am really clueless.

  2. Martin

    https://files.mdosch.de:5281/upload/kxJJ3xeOt6uqDKsS/2021-03-30-103332_scrot.png

  3. jonas’

    I got prober infrastructure related alerts around the same time

  4. jonas’

    and BCCs of yax.im alerts

  5. Ge0rG

    yax.im had downtime issues today

  6. Ge0rG

    I rebooted the server short after 9 CEST

  7. jonas’

    Martin, yeah, both prober nodes had issues, looks like an outage of the monitoring itself.

  8. jonas’

    Ge0rG, I sure hope that was not related.

  9. Ge0rG

    jonas’: should I check my logs for whether ojn killed the server? ;)

  10. Ge0rG

    I think it was rather related to /proc being dysfunction.

  11. jonas’

    Ge0rG, please validate when exactly you restarted

  12. Ge0rG

    10:39:05 up 1:20, 1 user, load average: 0.18, 0.23, 0.19

  13. jonas’

    that would make it 09:19 CEST?

  14. Ge0rG

    Yeah.

  15. jonas’

    that’s 7 minutes after the first ojn prober node alert came

  16. jonas’

    that’s 5 minutes after the first ojn prober node alert came

  17. jonas’

    this is really strange

  18. jonas’

    good thing I have a kibana to look at this later on

  19. Ge0rG

    jonas’: Right. I reacted to the prober alert by switching screen terminals in SSH and everything was laggy and felt b0rked. Well possible that there was some global carrier outage

  20. Ge0rG

    But as the machine was in a weird state anyway I decided to reboot.

  21. jonas’

    that would be an interesting outage because both vantage points reported >50% error rate

  22. jonas’

    (hetzner-AS and conova-AS)

  23. jonas’

    and it affected at least you and martin

  24. jonas’

    I’ll have to take a careful look at everything later

  25. Ge0rG

    jonas’: I don't have any anomalies on the icmp graphs from hosteurope

  26. Martin

    jonas’, Ge0rG: Thanks for the clarification.

  27. jonas’

    Ge0rG, Martin, one of you folks running with debug logs enabled, can you check if the prober ojn still sends garbage after </stream:stream> or whether that also magically disappeared with the update I just did?

  28. Martin

    No debug logs here

  29. Martin

    So the prober sometimes sent garbage? Was this the reason for the this mornings alarms? Then I wonder why it didn't happen earlier or more often. 🤔