What process is writing to disk?

What program can I use to determine what process is writing to disk?

I've got a Linux server and iostat reports something is writing to the system drive:

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz    await  svctm  %util
sda               0.00   169.83  1.75 273.82   141.65  3465.34    13.09    69.29  263.50   2.23  61.42
sda               0.00   252.50  0.00 228.25     0.00   3526.00    15.45    53.86  204.70   2.50  56.95

Unfortunately lsof gives me no serious clue:

lsof +d / | awk '$4 ~ /[0-9].*[uw]/' # search for all files noted to be open for writing/updating.

The result is : ssh-agent, samba (/etc/samba/secrets.tdb - 8k file) and another log file in /tmp which is clearly idle.

Oh, and swap is empty.

I suppose that high IO activity means high system call rate and, dipending on IO subsystem, high IO waits. Did you try to run lsof passing the pids of the top top processes ?
I believe fuser could give some clue too.

Try all these commands,
lsof
top
fuser

  • nilesh

Thanks, radoulov, but I don't see a way for top to display "high system call rate" or "high I/O rates" anything like that.

fuser is helping, to some extent, but I have no idea how to parse the ACCESS field when using -v. That might give me some clue. Is there any way to see writes-per-second on a per process basis?

what does vmtstat 1 5 give you?

Sorry for not being clear, I meant high CPU usage (due to the high system call rate).

You may try iostat -d -p to see I/O activity by partition and thus restrict the possibilities.

Well,
the first think that I would begin with starcing the top CPU consumers to see where the CPU cycles go.

$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2    136 850584 148660 2616060    0    0   183   143    3    4  9  2 71 17  0
 0  4    136 854368 148816 2616112    0    0   192  2396 2401 1259  0  2 40 58  0
 0  2    136 854740 148876 2616876    0    0     0   736 2056 2842  4  1 50 46  0
 0  3    136 1733284 148944 1738972    0    0   140  1756 2152 3567  0  6 49 45  0
 0  3    136 1717312 149052 1755468    0    0  8244   740 2188 3970  0  2 44 54  0

There is one process using high amounts of CPU usage. We tried disabling it to see if the IO problem went away, but it did not.

Check this.

Ironically, iostat's is taking FOREVER to download. Must be that they're python based. :wink:

I'm trying collectl (from sourceforge), but their options in 3.x don't match their documentation (2.4).

Here you go:

sar -x ALL 3 10

I believe this will give some idea ... (it will display paging per process activity, not directly I/O activity, but I believe they are related).

May be something like this?

sar -x ALL 2 10|awk \$4

Try to find a C version :).

It's the download, not the process :slight_smile:

That's pretty cool, but like you said, it's not IO activity but page faults. Since we're not swapping, this really doesn't tell us anything. But just to be sure, stopping the biggist pagefaulter had no effect on io load. :frowning:

There is another nice method:
check man 5 proc for the columns that you want to monitor and run something like this:

awk '{print $1,":", $12}' /proc/[0-9]*/stat|sort -t: -k2rn

Hm ...,
you're right, you need a kernel patch for this :eek:

link

Ah crap... From collectl - Process Monitoring

We're running 2.6.18.

...
I hate guessing, but I think it's your best bet now: what application(s) is/are running on this server?

Oh, just SAMBA, MySQL, mcast, proftpd, clam ... nothing much. :slight_smile:

Ah ...
Did you try to stop mysql (if possible given your SLA, of course :))?