Temporarily lose connection to AIX box

Hello All,

I have a strange problem and I'm hoping someone can help. I honestly don't know where else to look.

I have a number of AIX boxes in my environment and for some reason I would periodically lose connectivity to all services (except ping) to one of my boxes. It happens sporadically and usually lasts about 10 mins. I cannot SSH, Telnet or access the web server during this time but I can ping the box. I can however, SSH to another box on the same subnet and can access all the services on the problematic system as expected.

I've verified IP addresses (not duplicated) and subnet masks and gateways and ipsec. I've looked at this from the network side, firewalls, routers, switches and nothing is out of the ordinary.

Really stomped with this one.

I do not think there is a "solution ot f the box" and i hope you do not expect one from us.

Here are a few pointers on how to analyse the problem further:

  • is the system in question connectd to a HMC? If so, is a VTerm still possible during such an outage?

  • what does the errpt (and perhaps other logs, like /var/log/messages , ...) say about the time in question? Are their any entries which could be connected to the symptom?

  • if there is any recurring time schema in place: have you looked at the cron jobs of all users to make certain it is not some job which causes the outage?

  • do you have some sort of monitoring? What does it say about the time of the outage and the time shortly before it? You might want to set up something and wait until the next occurrence: i'd start with vmstat , netstat and ps each piped into a file. You could clean that out daily if no outage has taken place.

I hope this helps.

bakunin

Thanks bakunin.

I wasn't looking for an out of the box solution but you did point me in the right direction.

I ran this command errpt -J TS_LOC_DOWN_ST,TS_MISCFG_ER -a | more and it showed my that a particular IP was expected by the system, however, the box is configured with a different IP. So once I placed the expected IP on the eth0 interface...so far so good. Monitoring to see if the issue comes up again.

Thanks again.

1 Like