We have a AIX Server with TSM installed.
This server has en0 for administration purposes and we have en1 for backup stuff.
en0 subnet 10.10.10.x
en1 subnet 10.10.20.x
The issue we are having is all of a sudden the LPARs we are backing up lose connectivity to the AIX-TSM Server. for example
en1 ip is 10.10.20.10 and a TSM LPAR client that we are backing up is 10.10.20.11
Backup fails and we found that LPAR client 10.10.20.11 won't ping 10.10.20.10, but if go to the AIX-TSM server ping works and ping from LPAR client is back.
It's like ping from the AIX-TSM box reactivates connectivity.
Something important to mention we found the problem making backups but today we were doing to test and the connectivity is gone doing nothing and pinging from AIX-TSM server to LPAR Clients reactivates it
First, do you have a VIOS or are you using HEAs in your LPAR?
Second, do you have any "energy saving" enabled in your system profiles? Believe it or not, there is such a thing and the hardware might find it worthwhile to switch off a (seemingly) adapter to conserve energy.
found something, in this pureflex I have also Intel servers with Vmware which host some Windows VMs, I configure a Windows VM in the same VLAN and the Windows VM does not lose connectivity to the AIX-TSM IP.
Something else I found doing some more testing If I change the IP to the AIX-TSM server AIX LPARs won't ping
for example
AIX-TSM Server IP 10.10.20.10 If I change it to 10.10.20.30 LPARs AIX won't ping now Windows VM pings without any problem
to get AIX LPARs ping I have to go to the AIX-TSM server and ping any ip in the 10.10.20.x segment
so far looks like AIX issue...?
should I add a static route or something? but the thing is the AIX I am testing right now only have one interface in the 10.10.20.x segment not like the production LPARs
like I said I configured an extra AIX LPAR just for testing using just one nic interface not 2 like the others LPARs being backed up.
this AIX LPAR for testing is 10.10.20.16 and this is the command you are looking for
# netstat -nr
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Route Tree for Protocol Family 2 (Internet):
10.10.20.0 10.10.20.16 UHSb 0 0 en0 - - =>
10.10.20/24 10.10.20.16 U 0 18306 en0 - -
10.10.20.16 127.0.0.1 UGHS 0 670 lo0 - -
10.10.20.255 10.10.20.16 UHSb 2 1172 en0 - -
127/8 127.0.0.1 U 6 184693 lo0 - -
Route Tree for Protocol Family 24 (Internet v6):
::1%1 ::1%1 UH 1 14560 lo0 - -
but the way VIO in my p260 and AIX-TSM Power 701 are using etherchannel.
but just to let you know I removed etherchannel in Power 701 to test more and same issue I am not able to remove the etherchannel in the p260 since is Production and I am sure network 10.10.10.x has been working fine for quite some time
You are confusing me,
You showed a different server 1st, which does not have a gateway and now you are showing another server.
Paste the output from the problematic server (mention the hostname and/or IP), I don't need output from others.
Are you saying you did etherchanneling and configured different subnet on each adapter? have you done vlan tagging? For multiple VLANs to exist on same ent port we need vlan tagging and NOT just doing etherchanneling.
the output I just pasted is from a problematic server which has 2 nics I was using a "TEST" server which has just one nic in the 10.10.20.x segment but to be realistic I requested access to one of the production server having the issue... since it was out customer who found the issue
this server is 10.10.10.23 - Production IP
10.10.20.11 - Backup IP
yes both ends have etherchannels I mean Power 701 (AIX-TSM) is using 2 adapters for etherchannel and yes I used smitty vlan to create VLAN 500 for backups
for VIO in the p260 - pureflex - which hosts the PRODuction LPARs is using etherchannel and vlans are create as well
Yes I can login from AIX-TSM to 10.10.20.11 using ssh
this is the traceroute
Power-TSM:/> traceroute 10.10.20.11
trying to get source for 10.10.20.11
source should be 10.10.20.10
traceroute to 10.10.20.11 (10.10.20.11) from 10.10.20.10 (10.10.20.10), 30 hops max
outgoing MTU = 1500
1 10.10.20.11 (10.10.20.11) 1 ms 0 ms 0 ms
I have not tested - /etc/hosts - but do you think it will work?
Ah! ok, let me ask you this, have you looked the DNS?
nslookup <hostname of tsmserver> from client
also do the reverse nslookup
nslookup <IP of tsmserver> from client (x.x.20.11)
Is it resolving to correct IP/Hostname?
Check you /etc/resolv.conf file
Also, your /etc/netsvc.conf file for "hosts=xxxx" what is this value?
Now the question is, have you correctly feeded the subnet mask value?
Show me the subnet mask value from client and also from TSM server
You can get it by running ifconfig en1 ( if you want to grep netmask you can do so)
Value will be something like 0xffffff00 (this is an example)
Post the values from TSM server for 20.x network and also from client for 20.x network.