Disk performance problem on login

dangral · September 21, 2010, 11:35am

Running CentOS 5.5:

I've come across a relatively recent problem, where in the last 2 months or so, the root disk goes to 99% utilization for about 20 seconds when a user logs in. This occurs whether a user logs in locally or via ssh. I have tried using lsof to track down the process that is pegging the disk, but no results are returned until the disk thrashing is over, and I don't get any useful results. So I'm looking for a way to proactively narrow down the problem. Here is some more info:

Neo · September 21, 2010, 11:48am

When the users login, do they call the same login shell / script from /etc/passwd? If so, which one?

dangral · September 21, 2010, 12:36pm

Users' login shell is /bin/bash. I don't see anything interested in .bashrc or .bash_profile. Nothing of note in /etc/profile either.

mark54g · September 21, 2010, 1:29pm

check in /etc/profile.d/

jim_mcnamara · September 21, 2010, 1:34pm

If this is true for any user, then create a dummy user. Alter /etc/profile
to recognize that user only, t o set tracing on. You can at least detect where a hang, if any, occurs when you login as dummy.

It almost has to be software/shell script related, if it were hardware or filesystems then the problem would occur under other circumstances.

If that shows nothing then PAM is your next target for investigation.

dangral · December 7, 2010, 11:02am

So I found the script that was causing the poor performance, but other symptoms have surfaced as well. Sometimes a simple 'ls' on a directory with 30 files will cause the disk to spin for 45 or so seconds while the box is under no other I/O load. Again, no error messages are apparent, and SMART data looks fine.