Just as I was showing off the stability of my server to Aram (his host’s server keeps falling over), what should happen? You guessed it, my server crashes.
It looks as though the kernel panicked for some reason, although I can’t really tell why since nothing was written to the logs. It then proceeded to reboot, but for some reason the bootloader, grub
, decided to break just at that point. Not being able to get the machine to boot from anything but the hard drives (both with the broken grub), I had to call in some helping hands from 49pence. At roughly 00:30 on a Saturday morning.
After waiting for about an hour for a call from them, I went to bed, and could hardly sleep. I take this all too seriously… Anyway, I gave ’em another call this morning and quickly jumped in the shower. This time the call came in while I was still in the shower. I jumped out and got the ball rolling again. I got them to turn on serial console redirection (so I could work on the box remotely), and pop in a Debian install CD. Using that, and some cunning, I managed to roll back to a working version of grub and reboot. Problem solved, about 11.5 hours after the initial crash.
Lessons learned:
- Do not upgrade anything as critical as a bootloader if it works and has no security holes. Just don’t do it. No matter how much better the new one is.
- Keep an install CD (or other rescue CD) in the drive to boot off in emergencies. I could have had the machine back up again within 30 minutes.
- Keep serial console redirection on all the time.
- Don’t brag.
So, problem solved, and I won’t be needing helping hands again unless hardware breaks. Oh, and I got the helping hands free since they missed my first call somehow.