Wanna see this logo while booting your 2.6 kernel? Click here!

10.12.2004 14:07

ICP RAID Controller Status ERROR - mission impossible?

Today is one of these days, you don't want to wake up in the morning. Well, I had to, because one of my servers wasn't booting anymore (LI was the only thing, that LILO [EN] was able to tell me).

The raid controller of this system is an ICP GDT8546RZ [DE] and so I called ICP Technical Support where a very nice man helped me out there by removing all raid components from the raid array and creating a single disk with one physical drive. "LI" was still the only thing I got, but nice to see that all my data seems to be still alive.

After quick investigation, I found out, that this problem should be resolved in a few minutes. SHOULD!. Reinstalling LILO would do the trick, but - how to reinstall LILO on systems, that are far away from standard configuration? I'm using SGI's XFS [EN] as filesystem on this server, so I'd need an XFS BootCD. Blade's Woody XFS Netinst [EN] would do, but it has no gdth (ICP Raid Controller) support onboard. I tried to find a suitable gdth.o module for the 2.4.20-bf2.4-xfs kernel on this CD. After all gdth.o modules (found via Google) produced either a kernel hang while insmodded or required tons of symbols my kernel didn't export, I tried to build my own 2.4.20-bf2.4-xfs kernel on my notebook. Needless to say, on days like today, you really need a floppy drive in your notebook and guess what? I don't have one.

The next two or three hours I spent with getting this system to boot with XFS and gdth support. I failed. While searching my notebooks filesystem for useful software to solve this problem, I stumbled accross the KANOTIX Bug-Hunter 10 [DE] iso image. "Well, let's give it a try.", I thought. Just for you to imagine my current situation: It's 12:00am, you should be on vacation, you're working for about 4 hours on the same problem and you're not a damn little step ahead.

Nice surprise, KANOTIX Bughunter 10 has XFS and gdth support included. mount, chroot, lilo, exit, umount, reboot. The system is now running again, for about two hours (stable) and currently the data is being stored on tape (you know, on days like this, it's good to have additional backups). After that, I'll make a ghost image and try to re-establish the RAID1 host drive. And after everything is done, I'll have a nice cold beer in a near pub and make my way back home to my family.

Note to myself: Use grub instead.