Problem with RP3440

Richard_Davies · March 13, 2012, 4:57pm

One of our customers has a RP3440 running HPUX 11.11. It has run without a problem for 7 years. Now, today a telnet connection takes at least 5 mins to connect. Id I logon to the console I can't ping localhost or the server name, even though both are in the /etc/hosts. Can ping the ip addresses. netstat hangs. I have shut the server down, brought it up single user - fine, init 1 - fine, init 2 takes 45 mins and seems to take a long on services that are not even marked to start, doesn't seem to be any hardware problems or errors, ioscan shows all expected hardware as claimed, no errors in syslog.log, rc.log. At a loss what to do now.

Richard

vbe · March 13, 2012, 5:12pm

If the server is fine, you are looking at the wrong place...
Check your lan interfaces then do the same on the switch side, is there a mismatch between them?
If you are on 100FD fixed, don't let the other side on autonegociate...

Is /tmp ( or /var/tmp...) full of small files? Yes it does slow a server, I had a RP8400 almost agonizing once because of of almost 100000 files in it... and rm * does not work because too much arguments...

Richard_Davies · March 13, 2012, 5:53pm

Thanks for the quick response. I will check out the network cable and switch, if I can figure out where the cables go. Not my sight and their IT person is off sick, so so crawling about tracing cables looks the order of the day, of cause nothing is labeled at the switch end of things.
The Oracle databases are shutdown, now taking a complete backup of the system - just in case.

Richard

vbe · March 14, 2012, 5:56am

The most important : What I said above is true if you see no extravagent load at the console, even worse: You have the impression the system is living his own life forgetting to work...( I had a similar cases when a switch rebooted and badly negociated :
Common: both ends agree on 100HD - Works fine but does slow down, you dont notice unitl heavy load or backup time are not the same anymore...
AIX: I had once when autonegociate on the aix after some network failure go on 10FD, unfortunatly the switch coul not deal with...
HP would in most case stay if not 100HD ot 100FD but the switch did not follow...
But once I had a bad luck and had one doing 10HD...)

If you have big load, then you are to chase after the culprit or if you have NFS mounts, look there first!
common process that hogs a sstem when goes bezeurk: dmisp, but the are a few others...

methyl · March 14, 2012, 9:14am

Off chance:
If you use DNS, have a look at /etc/resolv.conf and check that each DNS server mentioned is up and functioning.

Memory?
Check with glance or top that you still have all the memory you expect.

Hardware?
Look at recent entries in /var/opt/resmon/log/event.log for excessive disc retries or other anomalies.

vbe · March 14, 2012, 9:44am

you should be able to ping even with switch and server lan ports mismatch... but with a lot of loss, not able to ping is more of a concern...
Check what methyl suggests, and from the box try to ping elsewhere, better?
Then use stm utility to see your hardware from the console if it is that bad...
You dont have multiple Lan cards do you?
If so how are they configured?

Ports on switches can fail... can someone look on the switch if there is any free ports left that you could use, and plug the server there...

mabdelmageid · March 18, 2012, 2:42pm

I have faced this issue, and discovered that some one has removed the /etc/nsswitch.conf

You can

Regards,
Mohamad

quimera · March 19, 2012, 10:30am

I had similar problem, the DNS SERVER was down, so any client (telnet, ssh, http, ...) slow to connect.