-
Martin
jonas’: I got o.j.n reports at 09:07 and 09:12 but I don't see anything in my logs. If you are free and it doesn't make too much effort might you have a look what caused them. I am really clueless.
-
Martin
https://files.mdosch.de:5281/upload/kxJJ3xeOt6uqDKsS/2021-03-30-103332_scrot.png
-
jonas’
I got prober infrastructure related alerts around the same time
-
jonas’
and BCCs of yax.im alerts
-
Ge0rG
yax.im had downtime issues today
-
Ge0rG
I rebooted the server short after 9 CEST
-
jonas’
Martin, yeah, both prober nodes had issues, looks like an outage of the monitoring itself.
-
jonas’
Ge0rG, I sure hope that was not related.
-
Ge0rG
jonas’: should I check my logs for whether ojn killed the server? ;)
-
Ge0rG
I think it was rather related to /proc being dysfunction.
-
jonas’
Ge0rG, please validate when exactly you restarted
-
Ge0rG
10:39:05 up 1:20, 1 user, load average: 0.18, 0.23, 0.19
-
jonas’
that would make it 09:19 CEST?
-
Ge0rG
Yeah.
-
jonas’
that’s 7 minutes after the first ojn prober node alert came✎ -
jonas’
that’s 5 minutes after the first ojn prober node alert came ✏
-
jonas’
this is really strange
-
jonas’
good thing I have a kibana to look at this later on
-
Ge0rG
jonas’: Right. I reacted to the prober alert by switching screen terminals in SSH and everything was laggy and felt b0rked. Well possible that there was some global carrier outage
-
Ge0rG
But as the machine was in a weird state anyway I decided to reboot.
-
jonas’
that would be an interesting outage because both vantage points reported >50% error rate
-
jonas’
(hetzner-AS and conova-AS)
-
jonas’
and it affected at least you and martin
-
jonas’
I’ll have to take a careful look at everything later
-
Ge0rG
jonas’: I don't have any anomalies on the icmp graphs from hosteurope
-
Martin
jonas’, Ge0rG: Thanks for the clarification.
-
jonas’
Ge0rG, Martin, one of you folks running with debug logs enabled, can you check if the prober ojn still sends garbage after </stream:stream> or whether that also magically disappeared with the update I just did?
-
Martin
No debug logs here
-
Martin
So the prober sometimes sent garbage? Was this the reason for the this mornings alarms? Then I wonder why it didn't happen earlier or more often. 🤔