Boot process hang

Hello,

Sometime i see that the boot process hangs.
I am using rhel 6.2.
At that time in the console i see

Probing EDD (edd=off to disable)...

SSH service seems to be started but i can't login...

ssh logs (last lines) in verbose mode level 3:

debug2: we did not send a packet, disable method
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/identity
debug3: no such identity: /root/.ssh/identity
debug1: Offering public key: /root/.ssh/id_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug3: Wrote 368 bytes for a total of 1477

In /var/log/messages i see:

Jan 14 10:56:16 0-10 ntpd[3732]: kernel time sync status 2040
Jan 14 10:56:16 0-10 ntpd[3732]: frequency initialized 16.732 PPM from /var/lib/ntp/drift
Jan 14 10:56:19 0-10 mlogd[4230]: recv buffer size of unix domain data socket: 8192000
Jan 14 10:56:19 0-10 mlogd[4230]: send buffer size of inet domain data socket: 8192000
Jan 14 10:56:19 0-10 mlogd[4230]: start tcp connection listener...
Jan 14 10:56:19 0-10 mlogd[4230]: accepted connection[0] from 127.0.0.1
Jan 14 10:56:19 0-10 mlogd[4230]: connection from 127.0.0.1 established 
Jan 14 10:56:26 0-10 kernel: nf_conntrack: table full, dropping packet.
Jan 14 10:56:26 0-10 kernel: nf_conntrack: table full, dropping packet.
Jan 14 10:56:26 0-10 dpMgr: 6whasapi.c(414)[Info]:ha6wInitialize:create socket success, fd: 24
Jan 14 10:56:26 0-10 kernel: process `snmpd' is using obsolete setsockopt SO_BSDCOMPAT
Jan 14 10:56:26 0-10 kernel: fpn_shmem: fpn_shmem module initialized ffff88086597cbc0
Jan 14 10:56:26 0-10 kernel: VFCOUNT 0
Jan 14 10:56:26 0-10 kernel: pkp 0000:84:00.0: PCI INT A -> GSI 56 (level, low) -> IRQ 56
Jan 14 10:56:26 0-10 kernel: pcieport 0000:80:02.0: AER: Multiple Corrected error received: id=8010

Althought in a successful boot at the same point:

an 14 11:04:07 0-10 kernel: VFCOUNT 0
Jan 14 11:04:07 0-10 kernel: pkp 0000:84:00.0: PCI INT A -> GSI 56 (level, low) -> IRQ 56
Jan 14 11:04:07 0-10 kernel: pcieport 0000:80:02.0: AER: Multiple Corrected error received: id=8010
Jan 14 11:04:08 0-10 kernel: pcieport 0000:80:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=8010(Transmitter ID)
Jan 14 11:04:08 0-10 kernel: pcieport 0000:80:02.0:   device [8086:3c04] error status/mask=00001100/00002000
Jan 14 11:04:08 0-10 kernel: pcieport 0000:80:02.0:    [ 8] RELAY_NUM Rollover    
Jan 14 11:04:08 0-10 kernel: pcieport 0000:80:02.0:    [12] Replay Timer Timeout  
Jan 14 11:04:08 0-10 kernel: pcieport 0000:80:02.0: AER: Multiple Corrected error received: id=8010
Jan 14 11:04:09 0-10 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Jan 14 11:04:09 0-10 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.6wind.x86_64 #1

Thanks in advance

The RedHat 6 Technical Notes suggest to try to boot the kernel with the parameter

pci=noaer

so you might try that. But it looks to me like you have broken hardware. If that kernel parm doesn't work I would lean toward a hardware problem.

Thanks, i will give a try.

Additionaly i included the following

edd=off acpi=off 

Do you think that is ok to have these parameters or are more safe boot related?

I would not disable a bunch of features just for the heck of it. First I would try the "pci=noaer" by itself. The pci express error reporting is new to redhat 6 and redhat's docs suggest to turn it off to address a kernel hang somewhat similiar to yours. This is my reason to suggest disabling it. If that didn't work, maybe I would try something else. But I would try one change at a time. Then if it starts working you know which change fixed it.