Kernel panic after hard reboot and fsck

nikkadim · July 2, 2010, 11:04am

Could you please help with problem with megaraid controller and Dell PowerEdge 2850, all that I can see on thi screenshot:

All drives successfuly passed verifing from LSI controllers (Ctrl+A at startup), also I tried to boot from rescue llive cd and mount all the morrored drives and check it by fsck - ok.

pludi · July 2, 2010, 11:32am

Looks like something happened to the /etc/fstab file, or maybe even to the initrd. Can you boot up the Live-CD and post the content of the fstab file?

One other solution would be to boot up using the installation disks, start the rescue system, and try to re-create the initrd from there.

Also, fsck only checks the filesystems consistency, but that doesn't mean that it will correct the contents of files (after all, how should it know the correct contents).

nikkadim · July 5, 2010, 1:24am

I have booted from RH DVD via linux rescue switch to chroot /mnt/sysimage (linked to /dev/sda1).

Listing of my fstab is here:

listing of fdisk -l | more here:

ygemici · July 5, 2010, 8:18am

Was there lights on the leds of your server?
Maybe hardware failure in system..

Can you write details error?
Maybe you must use new kernel or you reload megaraid module to initrd..

regards
ygemici

nikkadim · July 5, 2010, 8:35am

Thank you for reply.
I have checked lights on top , but not found any hardware error lights.

Could you please explain how I can reload initrd?
I have tried to do

mkinitrd -v -f /boot/initrd-2.6.9-5.EL.img 2.6.9-5.EL

, and I see the proccess of modules (megaraid also) agregation, but it's stoped with error...

fpmurphy · July 5, 2010, 9:05am

Looks like it cannot locate the find command. Have you manually checked to see if you can locate this command?

nikkadim · July 5, 2010, 9:50am

This is the my mistake (not complete mounting), i have rebuilded the initrd but have the same result now... - kernel panic

ygemici · July 5, 2010, 10:31am

# export PATH=$PATH:/usr/bin:/usr/sbin:/bin:/sbin:/usr/bin

# mkinitrd -f -v --with=megaraid /boot/initrd-2.6.9-5.EL_new.img  $(uname -r)

edit like this and change the "default" line

 
# cat /boot/grub/grub.conf
default=1
....
....
title Red Hat Enterprise Linux Server New initrd-2.6.9-5.EL_new.img (2.6.9-5.EL_new)
        root (hd0,0)
        kernel /vmlinuz-2.6.9-5.EL ro 
        initrd /initrd-2.6.9-5.EL_new.img

reboot
let see you what happened

nikkadim · July 6, 2010, 10:10am

[mod] Broken Links to Images Removed {/mod]

Also I have tried to change in my fstab LABEL to /dev/sdaX, but OS loading ending with this:

---------- Post updated at 12:49 AM ---------- Previous update was at 12:21 AM ----------

ygemici:

# export PATH=$PATH:/usr/bin:/usr/sbin:/bin:/sbin:/usr/bin

# mkinitrd -f -v --with=megaraid /boot/initrd-2.6.9-5.EL_new.img  $(uname -r)

edit like this and change the "default" line

 
# cat /boot/grub/grub.conf
default=1
....
....
title Red Hat Enterprise Linux Server New initrd-2.6.9-5.EL_new.img (2.6.9-5.EL_new)
   root (hd0,0)
   kernel /vmlinuz-2.6.9-5.EL ro 
   initrd /initrd-2.6.9-5.EL_new.img

reboot
let see you what happened

Thank you!

I have performed rebuild initrd with

mkinitrd -f -v --with=megaraid_mm  --with=megaraid_mbox /boot/initrd-2.6.9-5.EL_new.img 2.6.9-5.EL

because in /etc/modprobe.conf

alias scsi_hostadapter megaraid_mbox

but after reboot I have the same screen 404 - File or directory not found.

---------- Post updated at 02:14 AM ---------- Previous update was at 12:49 AM ----------

last dmesg available here

---------- Post updated at 09:10 AM ---------- Previous update was at 02:14 AM ----------

The problem was in a disabled RAID in BIOS. After reboot now I have screens:

and after fsck:

fpmurphy · July 6, 2010, 10:45am

Looks like grub cannot find your initramfs after you fsck'ed the root filesystem - it is probably in lost+found. The previous "RAMDISK: Ran out of compressed data" message in your screenshot was a clue that there was something wrong with it anyway. You should rebuild it.

ygemici · July 6, 2010, 2:28pm

can you write the output ?

 # cat /etc/grub.conf

nikkadim · July 7, 2010, 2:26am

in grub.conf

It turned out that it was necessary to add the path /boot/ for kernel and initrd.

Now system going to boot and stoped here:

and after fsck -t ext2 /dev/sda1 -y

ygemici · July 7, 2010, 4:28am

Reboot and boot system for single user mode then run fsck

nikkadim · July 7, 2010, 7:11am

Yes, I have tried this but on next boot system again makes me checking filesystem..

and after pause:

Corona688 · July 7, 2010, 12:03pm

It appears some vital programs have been corrupted. Do you have a backup?

nikkadim · July 7, 2010, 12:29pm

no... I have no backup, this system base on mirror RAID.
May be I can copy some program from distributive DVD for this version?

Corona688 · July 7, 2010, 12:46pm

Mirror RAID is not a backup. It's less sensitive to disk failure but, as you've seen, there's other ways for filesystems to go wrong.

It's possible. We've got no idea how deep this corruption goes, though; it's possible a reinstall may be needed to fix this. Boot a livecd and get your vital data backed up before you try anything else.

nikkadim · July 8, 2010, 5:52am

When I performed boot from livecd and mount the /dev/sdaX partitions I have found that /sbin/mingetty work, and have not found any in lost+found

Now I can't fix problem with inoe 48912, I have tried to run e2fsk more and more from livecd for root /dev/sda1 (in ummount mode course), but error still exist...

Corona688 · July 8, 2010, 10:53am

Keep this up and it may be completely unable to mount soon, making recovery next to impossible. Stop MESSING with it for a few minutes and BACK IT UP!

Is this a hardware or software raid?

nikkadim · July 8, 2010, 1:15pm

This is the hardware RAID - on Dell PowerEdge 2850 via LSI SCSI RAID adaptor.

A few weeks ago the server goes unresponsive, it is powered on but does not show to the display cannot be pinged so no remote access, on reboot server says memory/battery problems were detected, the adapter has recovred but cached datas was lost, hit any key to conitnue, reboot continues fine, linux and oracle run fine, and then few hours later server goes into same state, same message on reboot. Server has been online for just over 3 years.
I have tied to replace memory but the same result.