Unable to mount previously-working NFS share from NIM to LPAR

tmooredba · April 10, 2019, 7:01am

Right, now that I've finally worked out this website, I'll ask my question!

I am having an absolute nightmare with NFS on AIX. I have used it many times, and I know what I'm doing, however I cannot fathom what is going on here. I have 2 LPARs, sitting on the same physical host. They are configured with an internal and external network. The internal network is being used here. Nothing has changed since this was working, as far as the network connections go. However, when I mount any exported filesystems from NIM to LPAR1, I get a timeout:

nfsmnthelp: NIMsvr: Connection timed out

I have checked the following:

/etc/hosts is correct on both, and I have tried using both networks
NFS is started on both NIM and LPAR1. I have tried restarting the services using `stopsrc -g nfs; stopsrc -s portmap` then starting them again
Stopping services, then running `rm -rf /etc/state /etc/sm /etc/sm.bak /etc/xtab /etc/rmtab; startsrc -s portmap; startsrc -g nfs; exportfs -a; showmount -e NIMsvr`. The last command shows the mount is available
Removing the export from NIM, removing it from LPAR1, then restarting NFS on both NIM and LPAR1, adding the mount back in and re-mounting (checking showmount -e before adding back in and after, and the mount shows up the second time)
Telnet to port 111 from LPAR1 to NIM works fine

I am out of ideas, can anyone help please? I am about to pull my last few hairs out!

hicksd8 · April 10, 2019, 7:16am

I'm not an AIX expert so I am only commenting from a generic point of view. That said, have you verified the NFS versions being implemented??

On modern Unix systems and storage NFS can come in Versions 2, 3 and 4.

If one implementation is a later version than the other, you can specify the version (2,3, or 4) on the mount command line.

Trying to inter-operate different versions often gives rise to odd-behaviour, errors, and malfunctions.

I'm still thinking about it and if I come up with anything else I'll post again.

hicksd8 · April 10, 2019, 7:25am

And, of course, the access rights must allow the connection; BOTH the NFS share AND the protection mask on the directory itself.

So for testing only, you could set the rights on the shared directory to 777 and share the NFS handle '-o rw,root' to allow the incoming NFS mount request to get root rights. Dangerous to leave it like that but it will tell you something if it then works.

tmooredba · April 10, 2019, 7:36am

Hi,

Thanks for the suggestion, but unfortunately this didn't work. I get the same error as before.

When you said "share the NFS handle '-o rw,root' I suspected you meant to put this into the exports file for that share...?

hicksd8 · April 10, 2019, 7:46am

Yes, indeed. Again speaking generically something like:

# share -F nfs -o rw,root  <directory>

or

# share -F nfs -o rw,root=<client>  <directory>

if <client> is in the hosts file of the NFS serving node.

tmooredba · April 10, 2019, 8:02am

OK yeah so that doesn't work. The share command is also not available on my NIM server:

[root@NIMsvr export]$ share -F nfs -o rw,root /export/archive
share: 1831-186 nfs not found in /etc/exports
share: 1831-186 -o not found in /etc/exports
share: 1831-186 rw,root not found in /etc/exports
share: 1831-190 unknown option: root

hicksd8 · April 10, 2019, 8:08am

tmooredba · April 10, 2019, 9:58am

Perhaps I've missed something but I'm not sure how relevant that post is. They show use of mountd, then discuss use of TCP and UDP, and the post doesn't get resolved...have I missed something in that thread?

hicksd8 · April 10, 2019, 10:06am

No, you didn't miss anything in that thread but it did say that different NFS versions use different protocols. The moderator that said that, Bakunin, is very knowledgeable on AIX.

He might chip in when he sees your thread.

tmooredba · April 10, 2019, 10:14am

Ah OK, that's fine I was already aware of that. I'm using NFSv3. TCP and UDP ports are available, and I can get from LPAR1 to NIM using the TCP port:

[root@NIMsvr /]$ rpcinfo -p | grep mountd
    100005    1   tcp  57906  mountd
    100005    2   tcp  57906  mountd
    100005    3   tcp  57906  mountd
    100005    1   udp  38084  mountd
    100005    2   udp  38084  mountd
    100005    3   udp  38084  mountd

[root@LPAR1 ~]$ telnet nimsvr 57906
Trying...
Connected to NIMsvr.
Escape character is '^]'.

M_Nixon · May 23, 2019, 3:59am

Check RPC is running on both servers.

rpcinfo -p <remote server>

tmooredba · May 23, 2019, 5:29am

It is indeed running. I have recently restarted the NIM server, and it took a while to come up. It was stalling at starting up the rpc.mountd daemon. Then it finished OK:

0513-059 The nfsd Subsystem has been started. Subsystem PID is 5177506.
05/23 09:10:51 tftpd: [00000001] EZZ7001I Starting.
0513-059 The rpc.mountd Subsystem has been started. Subsystem PID is 3080418.
0513-059 The rpc.statd Subsystem has been started. Subsystem PID is 4325578.
0513-059 The rpc.lockd Subsystem has been started. Subsystem PID is 3277042.
Completed NFS services.

See output of the rpcinfo command below:

UKDCMMORA-1:>rpcinfo -p NIMsvr
   program vers proto   port  service
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    200006    1   udp   2049
    200006    4   udp   2049
    200006    1   tcp   2049
    200006    4   tcp   2049
    100005    1   tcp  32768  mountd
    100005    2   tcp  32768  mountd
    100005    3   tcp  32768  mountd
    100005    1   udp  32805  mountd
    100005    2   udp  32805  mountd
    100005    3   udp  32805  mountd
    400005    1   udp  32806
    100021    1   udp  32971  nlockmgr
    100021    2   udp  32971  nlockmgr
    100021    3   udp  32971  nlockmgr
    100021    4   udp  32971  nlockmgr
    100021    1   tcp  32770  nlockmgr
    100021    2   tcp  32770  nlockmgr
    100021    3   tcp  32770  nlockmgr
    100021    4   tcp  32770  nlockmgr
    100024    1   tcp  32774  status
    100024    1   udp  33021  status
    100133    1   tcp  32774
    100133    1   udp  33021
    200001    1   tcp  32774
    200001    1   udp  33021
    200001    2   tcp  32774
    200001    2   udp  33021

RecoveryOne · June 7, 2019, 2:13pm

Hi, I seen something like this with systems with an external and internal network. Make sure your routing is setup correctly from the LPARs. I know, pretty generic to say.

Also any firewall between your systems? Also another reason for timeouts.
If there is a firewall consider updating /etc/services with new mountd entries then refresh the rpc.mountd service. By default, I think mountd can use anything between 32768 to 65535. So by forcing it to a port you can reduce the set of rules that need to be modified.