Unusual NFS mount problem on only ONE client: Red Hat WS Rel 3

neelpert1 · January 16, 2009, 1:46pm

This is an unusual situation where I have an NFS server currently serving out MULTIPLE clients over several variants of Linux and UNIX successfully (world permissions) except for a SINGLE client. Even the other Linux (SuSE) clients in the same room are mounting successfully with defaults without incident. Server and workstation are on the same subnet with NO firewall and (I don't believe) any routers between them. The "traceroute" command shows a single hop directly to the server and nothing in between.

THIS client 'just stopped working' and the only error from the /var/log/messages file indicates that it "could not contact the NFS server: operation timed out."

It all gets fixed if I use 'tcp' as the mount option in /etc/fstab and manually, bypassing UDP altogether. However this client had been working up until this week and AFAIK no changes were made to this client. Again, other Linux clients IN THE SAME ROOM and ON THE SAME SWITCH are doing just fine and not having this problem!

Even the client system itself mounts two other NFS file systems from a different server without this workaround (IRIX server, believe it or not) but cannot mount the SuSE server without using the 'tcp' flag and bypassing udp protocol.

Any clues here? I've googled this particular issue in bug reports but to no avail. Here's the output from the client side of what's being served:

rpcinfo -p hardhead
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
390436 1 tcp 9907
390435 1 tcp 9890
390113 1 tcp 7937
390115 1 tcp 8494
100024 1 udp 32771 status
100021 1 udp 32771 nlockmgr
100021 3 udp 32771 nlockmgr
100021 4 udp 32771 nlockmgr
100024 1 tcp 42410 status
100021 1 tcp 42410 nlockmgr
100021 3 tcp 42410 nlockmgr
100021 4 tcp 42410 nlockmgr
390103 2 tcp 9801
390109 2 tcp 9801
390110 1 tcp 9801
390120 1 tcp 9801
390109 2 udp 9715
391060 1 tcp 850
390107 5 tcp 9289
390107 6 tcp 9289
390402 1 tcp 9011
390105 5 tcp 9164
390105 6 tcp 9164
390433 1 tcp 9598
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100005 1 udp 805 mountd
100005 1 tcp 806 mountd
100005 2 udp 805 mountd
100005 2 tcp 806 mountd
100005 3 udp 805 mountd
100005 3 tcp 806 mountd
391029 1 tcp 46557
391030 1 tcp 54937
390104 105 tcp 9420
390430 1 tcp 8334
390429 101 tcp 9183

As you can see, NFS is available on both udp and tcp versions 2 and 3.

otheus · January 16, 2009, 7:11pm

As I understand it, TCP is better anyway, given that TCP is tuned on your network for LAN access (see TCP/IP tuning guides).

Still, that doesn't solve the problem.

You didn't mention what was running on the server. What OS, version, distribution, and kernel version, please? On the client, what does "uname -a" report? Also what version of NFS is being used on each? You can use "rpm -qi nfs-tools" or similar to get that information.

neelpert1 · January 19, 2009, 11:25am

uname -a reveals the following from the server:

Linux hardhead 2.6.16.54-0.2.5-smp #1 SMP Mon Jan 21 13:29:51 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux

and the client:

Linux gvic3 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:18:24 EDT 2004 i686 i686 i386 GNU/Linux

The server is running SuSE and the client is running Red Hat.

I know the server is "serving out" NFS versions 2 and 3 (see rpcinfo from above) but I'm not 100% sure what the client is using. rpcinfo for the client reveals the following:

[root@gvic3 root]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 32768 status
100021 1 udp 32769 nlockmgr
100021 3 udp 32769 nlockmgr
100021 4 udp 32769 nlockmgr
100021 1 tcp 32769 nlockmgr
100021 3 tcp 32769 nlockmgr
100021 4 tcp 32769 nlockmgr
100007 2 udp 1004 ypbind
100007 1 udp 1004 ypbind
100007 2 tcp 1007 ypbind
100007 1 tcp 1007 ypbind
391002 2 tcp 32770 sgi_fam
100011 1 udp 739 rquotad
100011 2 udp 739 rquotad
100011 1 tcp 758 rquotad
100011 2 tcp 758 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100005 1 udp 755 mountd
100005 1 tcp 770 mountd
100005 2 udp 755 mountd
100005 2 tcp 770 mountd
100005 3 udp 755 mountd
100005 3 tcp 770 mountd

I tried the rpm command but didn't get anything under the 'nfs-tools' query.

otheus · February 9, 2009, 4:30am

It occurred to me that both Suse and RedHat distributions have automatic updates. It's possible they automatically updated some server tools which triggered an incompatibility. Did you check to see that modules and tools for nfs are still the same? That is, they have an older timestamp. Post the output of these commands:

rpm -qi nfs-tools
find /lib/modules -name "nfs.ko" -ls

neelpert1 · February 12, 2009, 11:42am

otheus:

It occurred to me that both Suse and RedHat distributions have automatic updates. It's possible they automatically updated some server tools which triggered an incompatibility. Did you check to see that modules and tools for nfs are still the same? That is, they have an older timestamp. Post the output of these commands:
rpm -qi nfs-tools
find /lib/modules -name "nfs.ko" -ls

I ran this command set on the client, and received zero output as a result.

The rpm query yielded nothing for nfs-tools, and the find command also came back with nothing there either.

The client is definitely not set up for auto updates, as it is connected only internally on corp. LAN and does not access the Internet.

The server is also tightly controlled, and auto-updates are turned off on that system. Updates on that particular NAS server are tightly controlled, here.

As far as I know, no major updates were done to either. The client *could* have had the opportunity to slip by, but I just verified that this client is definitely NOT on the corporate LAN. It is firewalled inside of our "dev-only" network.

Thank you for following up on this.

otheus · February 12, 2009, 1:44pm

I'm stumped.

If you have it working with TCP, it is now suggested that be used anyway. It actually has better performance, when the server and client are properly tuned.

cjcox · February 12, 2009, 5:10pm

Red Hat AS 3 (I guess you can call it RHEL 3, but that's incorrect) has a bug in it's broadcom, tg3 driver, your interface will probably be bouncing a lot... afaik, there is no fix for it. Move to RHEL 4 or 5 if you can.

If you're not using the tg3 driver, then I don't know what the issue is....

I'd avoid RHAS/RHES v3 if at all possible.