AIX prompt delayed when users increase

I am facing a situation where when there are lesser users, i am able to login to the AIX server. If the number of users increase - the login prompt is getting delayed. Sometimes even timeout occurs. This is after the upgrade to AIX 7.1 TL 4. Can someone suggest a way to overcome this situation?

How are you logging in?

What is the number of users on the system when you can no longer login reliably?

What is the load average on the system when you can no longer login reliably?

Has the amount of work assigned to this system changed since the upgrade? If so, how?

Maybe, but to do so we need to know more about the system:

How many is an "increased number" of users? (no exact number required, but are we talking about 10, 100 or 1000 users here?)

What is the size of the system? How many processors? How much memory? Have you tried a vmstat -tw 1 in a time of many users? also a vmstat -vs might help. If so, please post significant portions of the outpput here (within CODE-tags).

How is the name resolution done? We had "network problems", where the network was working fine but the name server took a long time to answer so that the connection seemed to be made slowly. Overall, can you describe you network a bit (in case the culprit is there)?

How is your system used by the many users you describe? Are they using a certain application, working on the command line mainly, do they only connect via the application layer (like a database running on the server and your users use DB Visualizer or a similar tool to connect to it)?

This is - off the top of my head - all i can come up with. Please provide a few more data and we take it from there.

I hope this helps.

bakunin

/PS: seems like i am not typing fast enough to catch up with Don. :wink:

The number of users when this issue was happening was 250. Connection is using SSH. We connect using any of the GUI (xterm) to login to the servers.
Network does not seem to be an issue.

System configuration: lcpu=64 mem=256511MB

 kthr          memory                         page                       faults           cpu       time
------- --------------------- ------------------------------------ ------------------ ----------- --------
  r   b        avm        fre    re    pi    po    fr     sr    cy    in     sy    cs us sy id wa hr mi se
  0   0    4749214    4687647     0     0     0     0      0     0     6   2970   760  0  0 99  0 00:19:28
  0   0    4750341    4686520     0     0     0     0      0     0    18  35490   893  0  1 99  0 00:19:29
  0   0    4749218    4687643     0     0     0     0      0     0     2  14121   815  0  0 99  0 00:19:30
  0   0    4749218    4687638     0     0     0     0      0     0    17   3871   836  0  0 99  0 00:19:31
vmstat -vs
           4465310913 total address trans. faults
            294360283 page ins
             49324316 page outs
                    0 paging space page ins
                    0 paging space page outs
                    0 total reclaims
           1465526223 zero filled pages faults
                60683 executable filled pages faults
            114267450 pages examined by clock
                    0 revolutions of the clock hand
             55792932 pages freed by the clock
             13653791 backtracks
                    0 free frame waits
                    0 extend XPT waits
             52515800 pending I/O waits
            343680275 start I/Os
             80649413 iodones
           3850688645 cpu context switches
            228964899 device interrupts
             44816910 software interrupts
           2057467216 decrementer interrupts
               505680 mpc-sent interrupts
               505680 mpc-receive interrupts
               230520 phantom interrupts
                    0 traps
         262745045277 syscalls
             65666912 memory pages
             63779457 lruable pages
              4687825 free pages
                    8 memory pools
              4365673 pinned pages
                 90.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 88.0 numperm percentage
             56179798 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 88.0 numclient percentage
                 90.0 maxclient percentage
             56171829 client pages
                    0 remote pageouts scheduled
                 1190 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2400 filesystem I/Os blocked with no fsbuf
                44434 client filesystem I/Os blocked with no fsbuf
                25130 external pager filesystem I/Os blocked with no fsbuf
                  7.3 percentage of memory used for computational pages

---------- Post updated 02-03-17 at 07:17 AM ---------- Previous update was 02-02-17 at 09:20 PM ----------

Right now, when the problem is happening

vmstat -vs
           4704754850 total address trans. faults
            295484675 page ins
             50773171 page outs
                    0 paging space page ins
                    0 paging space page outs
                    0 total reclaims
           1548595741 zero filled pages faults
                62036 executable filled pages faults
            114267450 pages examined by clock
                    0 revolutions of the clock hand
             55792932 pages freed by the clock
             14409311 backtracks
                    0 free frame waits
                    0 extend XPT waits
             52975790 pending I/O waits
            346252431 start I/Os
             81917039 iodones
           3938511790 cpu context switches
            233154812 device interrupts
             46845701 software interrupts
           2125584724 decrementer interrupts
               528400 mpc-sent interrupts
               528400 mpc-receive interrupts
               242521 phantom interrupts
                    0 traps
         264369806068 syscalls
             65666912 memory pages
             63779457 lruable pages
              3413598 free pages
                    8 memory pools
              4381020 pinned pages
                 90.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 89.8 numperm percentage
             57312540 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 89.8 numclient percentage
                 90.0 maxclient percentage
             57304175 client pages
                    0 remote pageouts scheduled
                 1235 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2400 filesystem I/Os blocked with no fsbuf
                44434 client filesystem I/Os blocked with no fsbuf
                25136 external pager filesystem I/Os blocked with no fsbuf
                  7.5 percentage of memory used for computational pages

Please use code tags, not icode tags, press the button.

A debugging tip. You can trace the sshd.

# proctree | grep -w sshd
   258180   /usr/sbin/sshd a
      274564   sshd: root@pts/0 a 

Then trace the originator/listener i.e. pid 258180

# truss -f -p 258180

And then do a login. You'll see what it does, and where it takes a long time.

2 Likes

Further investigation has shown the delays are occurring when the loginsuccess() function, called by sshd, tries to acquire a lock on the file /etc/security/lastlog.
Any idea what could be causing this? Or where to go look for a solution