I've come across a relatively recent problem, where in the last 2 months or so, the root disk goes to 99% utilization for about 20 seconds when a user logs in. This occurs whether a user logs in locally or via ssh. I have tried using lsof to track down the process that is pegging the disk, but no results are returned until the disk thrashing is over, and I don't get any useful results. So I'm looking for a way to proactively narrow down the problem. Here is some more info:
If this is true for any user, then create a dummy user. Alter /etc/profile
to recognize that user only, t o set tracing on. You can at least detect where a hang, if any, occurs when you login as dummy.
It almost has to be software/shell script related, if it were hardware or filesystems then the problem would occur under other circumstances.
If that shows nothing then PAM is your next target for investigation.
So I found the script that was causing the poor performance, but other symptoms have surfaced as well. Sometimes a simple 'ls' on a directory with 30 files will cause the disk to spin for 45 or so seconds while the box is under no other I/O load. Again, no error messages are apparent, and SMART data looks fine.