After umount -lf: kernel: nfs: server HOSTNAME not responding, timed out

Greetings!

I'm testing a failover solution for NFSv4 on RHEL6 with latest updates.
My script umounts (umount -lf /share) the faulty NFS share if it sees that's hanging on the client (the NFS daemon is down on the NFS server) and it mounts the share from another healthy NFS server.

Sometimes I can see that a process/thread is still in the memory and producing the messages in the $SUBJ - 'kernel: nfs: server SERVER not responding, timed out'
I already have the new healthy NFS share from another server and 'nfsstat -m' shows that this is the only share on the NFS client system.

I've tested the following commands to find the stacked process/thread:

  • lsof -i | egrep 'SERVER|SERVER_IP'
  • lsof +d /share
  • fuser -fvm /share
  • netstat -anop | egrep 'SERVER|SERVER_IP
  • ... and a couples of others with lsof.. but no PID.

Thread? ..the ps command thread related options don't show IP addresses/host names.. But the kernel is continuously logging the error.

Any suggestion is welcome to detect it.

Thank you Arsene

How are you mounting your NFS share ?

If you don't specify hard,intr as mount option, the client process cannot be killed or interrupted.

rw,noexec,nosuid,nodev,bg,soft,intr,rsize=65536,wsize=65536

---------- Post updated at 05:49 PM ---------- Previous update was at 05:48 PM ----------

Hi,

I forgot to mention but I don't restart the NFS client during the failover.

Try removing the bg option.

From NFS manual :

                     If the bg option is  specified,  a  timeout  or  failure
                      causes  the  mount(8) command to fork a child which con-
                      tinues to attempt to mount the export.  The parent imme-
                      diately returns with a zero exit code.  This is known as
                      a "background" mount.

Hi Arsene,

Can you please share the information on how did you achieve NFS failover ?