About the outages (July 1st and 2nd 2024)

m-p{3}@lemmy.ca · edit-2 4 months ago

About the outages (July 1st and 2nd 2024)

Shadow@lemmy.ca · edit-2 4 months ago

Something got into a weird state and restarting either the backend or frontend didn’t help. Taking the entire stack down and then bringing it back up, resolved it.

It’s weird since it crashed at 1am and at 3am we gradually restart all backend and frontends, so that automatic restart should have fixed it too. All the containers reported healthy, but nginx wasn’t reporting any available frontends.

I suspect some sort of weird lemmy bug, but we’ll just have to improve monitoring for now and try to debug this more if it happens again.