NFS server <servername> not responding still trying

newbie_01 · May 26, 2011, 3:36am

Hi gurus,

OS = SunOS 5.8

Not sure whether to post this in the scripting one or to advance and experts. Am posting on both since there is two things that am wanting to achieve.

Am currerntly having NFS server errors below. At this stage, I am not sure whether I am having a SAN storage issue or a network issue.

NFS server <servername> not responding still trying

Am leaning towards a SAN storage issue at the moment. Reason I said this is because there is about 10-15 NFS mount from this server and the NFS error is not happening on all of them only for some of them.

If I do a df -k, it lists down the filesytem and then stop on when it start getting NFS server error. For the time being, am wanting to know how to isolate which mount points are having NFS issues, is there a command that I can run that will report on what mount points is NFS having problems with instead of running df which hangs midway?

I thought about writing a script that will scan the /etc/mnttab and run a df of each filesystem in the /etc/mnttab file but unfortunately when it get thru the one that it is having problem with, the script stalls and cannot continue. Is it possible to put a "timer" for the df <filesystem> and if it is taking more than 10secs, it terminates itself and then continue with doing the df of the next filesystem?

To illustrate what am wanting to do, for example, the /etc/mnttab file have the following mount entries:

/etc/mnttab example:

/nas_mnt/u01
/nas_mnt/u02
/nas_mnt/u03
/nas_mnt/u04
/nas_mnt/u05

I want to have a script that does ...

while read mnt
do
   df -k ${mnt}
done < /etc/mnttab

So what am wanting to achieve is giving df maybe only 10 seconds and if it does not response, then terminate and process the next one in the list.

This is what I have in mind. Will it work?

   /etc/mnttab example:
   
   /nas_mnt/u01
   /nas_mnt/u02
    /nas_mnt/u03
    /nas_mnt/u04
    /nas_mnt/u05
   
 df.sh:
 df_processid=$$
 echo "${df_processid}" > df.lock
 df -k ${1}
 remove df.lock
 
 
   while read mnt
   do
      df.sh ${mnt} &
    sleep 10
   -- check for df.lock, if it exists, then kill -9 for df_processid
   -- otherwise do nothing
 done < /etc/mnttab

On the other hand, of course it would be best if there is a command that I don't about that can check which of the NFS mounts are having problems and which aren't.

Any response / feedback will be much appreciated. Thanks in advance.

Create · May 26, 2011, 9:32am

Are the mounts that are failing on the same network/subnet? You may find that you are having some issues with the network and not really the client. Have you tried mounting these via TCP instead of UDP? If the TCP stops the alerts, I would take a look at the network.

Peasant · May 26, 2011, 1:04pm

If it's SAN issue, you should check if multipath is configured for the mountpoint / disk [s]failing.
Perhaps administrator forgot to configure path, one of your FC cards died and it cannot see LUN(S) used for this mountpoint.

Also check clients, you would want to umount that share from all clients using broken NFS share.

Is there anything in syslog ?
Basicely, NFS issues are visible clearly in the client's syslog and disk problems should be on server's syslog.

Using open source solutions (or your scripts) you can parse those logs on multiple servers and create notifications and such (email, web or whatever).