strange route issue in powerha 5.4?

rs6000er · June 24, 2009, 11:28am

hi all

we upgraded hacmp(powerha) from 5.2 to 5.4 recently .

during the failover testing, we found a strange network issue. after standby node took service ip address (172.15.100.8) online at standby NIC, we were able to log in the standby node by telnet 172.15.100.8 which stays at standby node standyb NIC.

when we tried to run ping or traceroute to a ip address outside of of our network, for example google.com or ibm.com, the NIC that holds service ip address were not able to send any packet out at all.

instead, the primay NIC that holds boot address of standby node (172.15.103.79) will take in place and send the ping packet out or traceroute to out side of world.

on the other hand, any testing within our intranet acted normal. we checked dns server and routing table of standby node and gateway setup, they were all good.

the follwoing info is the routing configuration from standby node.

Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route Tree for Protocol Family 2 (Internet):
default 172.15.100.150 UG 1 28215 en4 - -
127/8 127.0.0.1 U 9 921983 lo0 - -
172.15.100.0 172.15.103.79 UHSb 0 0 en4 - - =>
172.15.100/22 172.15.103.79 U 6 4212671 en4 - -
172.15.103.79 127.0.0.1 UGHS 0 671937 lo0 - -
172.15.103.255 172.15.103.79 UHSb 0 288 en4 - -
172.16.60.0 172.16.60.60 UHSb 0 0 en5 - - =>
172.16.60/22 172.16.60.60 U 2 2712283 en5 - -
172.16.60.60 127.0.0.1 UGHS 0 382645 lo0 - -
172.16.63.255 172.16.60.60 UHSb 0 1 en5 - -

Route Tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 236 lo0 - -

as you may see, the default routing is set up for en4 (the primay NIC that holds boot ip address). is it possible that we manally set up a route entry to let en5 ( the standby NIC that will holds the service ip address when failover) go out side as well like this?

Routing tables
Destination Gateway Flags Refs Use If Exp Groups

Route Tree for Protocol Family 2 (Internet):
default 172.15.100.150 UG 1 28217 en4 - -
127/8 127.0.0.1 U 10 922176 lo0 - -
172.15.100.0 172.15.103.79 UHSb 0 0 en4 - - =>
172.15.100/22 172.15.103.79 U 6 4213902 en4 - - =>
172.15.100/22 172.15.100.8 UG 0 6 en5 - -
172.15.103.79 127.0.0.1 UGHS 0 672133 lo0 - -
172.15.103.255 172.15.103.79 UHSb 0 288 en4 - -
172.16.60.0 172.16.60.60 UHSb 0 0 en5 - - =>
172.16.60/22 172.16.60.60 U 2 2712830 en5 - -
172.16.60.60 127.0.0.1 UGHS 0 382726 lo0 - -
172.16.63.255 172.16.60.60 UHSb 0 1 en5 - -

Route Tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 236 lo0 - -

shockneck · June 24, 2009, 1:23pm

The problem is not related to the HACMP release but to the fact that you set the default route for an aliased network at boottime when HACMP is not yet active. To get around that routing problem you could set the routes from the Application Server or you could configure Persistant IP Labels of the same net as the Service Label on the cluster nodes. In the second case choose Collocate with Persistant IP as policy to avoid follow up routing problems.