Network Connectivity lost after reboot

I have 4 V440 servers running Solaris 9. I have their interfaces configured (ce0) and have connectivity to our network. However, after reboot,...the connectivity is lost although the interface shows that its still up after an ifconfig -a.

Only after I reconfigure the interface do I restore network connectivity using ifconfig commands from command line.

Would appreciate any help.

Storageguy

Please post a snap from your /var/adm/messages file....
Maybe auto-negoiation isnt working correctly. You can try to hard-code the speed and duplex settings both on the V440 and your switch..

Check that there is a host entry in /etc/hosts for whatever you have in /etc/hostname.ce0

Also if you have a subnetmask that isnt one of the old standard class A,B or C networks it needs to be in /etc/netmasks

Also ... just in case check you tcp/ip startup files are present

/etc/rc2.d/S69inet
/etc/rc2.d/S72inetsvc

Networkfre@k, thanks for your reply.....both server and switch are set to 100 Full Duplex.

Unbeliever, thanks for youre reply. Both hosts and hostname.ce0 are correct. I have the subnet listed in the /etc/netmasks file.

/etc/rc2.d/S69inet and /etc/rc2.d/S72inetsvc are there.

Check that /etc/hostname.ce0 is ce(zero) and not capital O which would cause the system not to find the hostname/ip to set up on ce0.

Check that both startup scripts mentioned are executable by root (and owned by root).

Setting ce interface to specific required settings:
The commands for the ce fiber card (Sun GigaSwift 1.0) are:

ndd -set /dev/ce instance 0 (settings for ce0)
ndd -set /dev/ce adv_1000hdx_cap 0 (no half-duplex)
ndd -set /dev/ce adv_pause_cap 1
ndd -set /dev/ce adv_asmpause_cap 0 (send/receive pause frames)
ndd -set /dev/ce adv_autoneg_cap 0 (turn auto-negotiation off)

These should be in a start up file - change to specific settings required by your site. Always check that the interface is truely up by looking with the ndd command parameter link_status (ifconfig can state UP but that does not mean the connection is up).

After rebooting the server, instead of maually putting in the correct values for the server, enter the ifconfig as.

ifconfig ce0 + netmask +

this will tell ifocnfig to use the values in the configuration files for this interface do not plumb the interface before doing this. If it is not plumbed and you get an error, the the problem is most likely caused by the hostname.ce0 having an O instead of a zero as described by RTM.

If however this does work but incorrectly configures the iterface, most probably it will be the netmask which is incorrect. Solaris default is a class b subnet (ffff0000) if you have a class c it should be ffffff00, etc. If the netmask is incorrect add the appropriate entry and try again with the command given above.

Out of interest has it ever worked correctly after a reboot? Also have you recently changed the switch to which it is connected?

We've had problems before with suns and auto negaotiation with switches ... although we've never had it auto negotiate to not up at all :slight_smile: We just hard code the switches to the relevant speed now.

If you can and are allowed posting the output of ifconfig -a before and after running reborg's suggested command would tell us a lot.

REBORG,...ifconfig ce0 + netmask + produces the following output.

ifconfig: + :bad address

I'm have a zero on /etc/hostname.ce0. Same as the other 10 V440s.
Netmask is correct. Same on all servers. 255.255.255.128
6 of these machines reboot and keep connectivity. 4 do not.

RTM,....startup scripts are owned by root. The systems are
already set to the ndd settings you suggested below.

Unbeliever,..../etc/hosts files are consistent with hostname.ce0
start up files. No,...these 4 machines have never worked after reboot.
All 4 are hard set to 100 full dup. So is the switch.

Running out of ideas here ...

Are all the machines at the same patch levels? If not check to see if there are any network specific patches missing on the machines that dont work.

As a hack you could create a new startup file

/etc/rc2.d/S99startnet

and put in it the commands you need to do manually once the system is up and running. This, however, is a rather horrid solution to something which should be fixable.

check for the /etc/netconfig file if it exists .

Confirm that you checked the startup files for rwx.

Post any errors being logged into your messages file during boot - if you don't have errors being logged, turn them on in syslog.conf.

Give more information on the system(s) relevant to the situation - such as, are the interfaces being used for clusters? Is there more than one interface on these servers?

RTM,....all 4 machines have 2 nics. There are no relevant errors logged. These machines are not clustered. They are individual DNS servers.

PPASS,....netconfig exist on all machines.

I tried creating a new startup script as Unbeliever suggested.
After rebooting,...there was still no access to network until I downed
interface ce1. Ce0 could ping and respond to pings.

Would like to thank everyone for your help on this issue.

Please be more informative - this sounds to me like you downed ce1 and ce0 started working - is that the case?

(And you still haven't stated if the start up scripts are set to rwx for owner) :slight_smile:

RTM,....all the startup scripts have rwx permissions for root.
And yes,..when I downed ce1,...ce0 started working.

Your original post didn't mention anything about multiple interfaces ... are you actually using both ce0 and ce1? Is the ce1 interface even plugged into the network? Are they plugged into the same network with for automatic failover?

Unbeliever,..There was only one interface when I started this post. Now there are 2,.....on the same network, but no autofailover. Both are plugged in.

Are both interfaces using the same Ethernet address?

ifconfig -a

will show you all the info. If the interfaces have different IP addresses and the same ethernet address this may cause the problems you are experiencing.

Unbeliever,...all interfaces have different ethernet addresses. I changed the setting from the boot prom a while back. IPs are not the same.

To reiterate, I configured all 10 of these machine's interfaces the same way. Only 4 have this problem. After reboot, both interfaces appear to be up if you do ifconfig -a. However, they are not pingable, not can I ping out.
Only after I do the following they become accessible.

ifconfig ce0 down
ifconfig ce0 xxx.xxx.xxx.xxx netmask 255.255.255.128
ifconfig ce0 up

(same sequence for second nic which ce1)

Would creating a new startup script like S99startnet work? Will it override whatever is taking place now at startup? If so,..what can I include in this script? Yes,.....desperate measure,..but doing what I have to do as these machines will sure have to be rebooted at some point.

Creating /etc/rc2.d/S99startnet should work yes. It should contain exactly the commands you need to run from the command line to get the interfaces working.

However, in your position I would be using it only as a temporary fix in order to get things working. I would not suggest leaving it like that. You really need to find out what is causing the problem.

Pleace reply if you solve this I got the same problems a couple of times and Ive never been able to solve this with other than a reinstall of solaris