otheus
October 29, 2008, 5:27am
1
What program can I use to determine what process is writing to disk?
I've got a Linux server and iostat reports something is writing to the system drive:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 169.83 1.75 273.82 141.65 3465.34 13.09 69.29 263.50 2.23 61.42
sda 0.00 252.50 0.00 228.25 0.00 3526.00 15.45 53.86 204.70 2.50 56.95
Unfortunately lsof gives me no serious clue:
lsof +d / | awk '$4 ~ /[0-9].*[uw]/' # search for all files noted to be open for writing/updating.
The result is : ssh-agent, samba (/etc/samba/secrets.tdb - 8k file) and another log file in /tmp which is clearly idle.
Oh, and swap is empty.
I suppose that high IO activity means high system call rate and, dipending on IO subsystem, high IO waits. Did you try to run lsof passing the pids of the top top processes ?
I believe fuser could give some clue too.
Try all these commands,
lsof
top
fuser
otheus
October 29, 2008, 9:51am
4
Thanks, radoulov, but I don't see a way for top to display "high system call rate" or "high I/O rates" anything like that.
fuser is helping, to some extent, but I have no idea how to parse the ACCESS field when using -v. That might give me some clue. Is there any way to see writes-per-second on a per process basis?
vbe
October 29, 2008, 9:57am
5
what does vmtstat 1 5 give you?
Sorry for not being clear, I meant high CPU usage (due to the high system call rate).
You may try iostat -d -p to see I/O activity by partition and thus restrict the possibilities.
Well,
the first think that I would begin with starcing the top CPU consumers to see where the CPU cycles go.
otheus
October 29, 2008, 10:08am
7
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 2 136 850584 148660 2616060 0 0 183 143 3 4 9 2 71 17 0
0 4 136 854368 148816 2616112 0 0 192 2396 2401 1259 0 2 40 58 0
0 2 136 854740 148876 2616876 0 0 0 736 2056 2842 4 1 50 46 0
0 3 136 1733284 148944 1738972 0 0 140 1756 2152 3567 0 6 49 45 0
0 3 136 1717312 149052 1755468 0 0 8244 740 2188 3970 0 2 44 54 0
otheus
October 29, 2008, 10:10am
8
There is one process using high amounts of CPU usage. We tried disabling it to see if the IO problem went away, but it did not.
otheus
October 29, 2008, 10:21am
10
Ironically, iostat's is taking FOREVER to download. Must be that they're python based.
I'm trying collectl (from sourceforge), but their options in 3.x don't match their documentation (2.4).
Here you go:
sar -x ALL 3 10
I believe this will give some idea ... (it will display paging per process activity, not directly I/O activity, but I believe they are related).
May be something like this?
sar -x ALL 2 10|awk \$4
Try to find a C version :).
otheus
October 29, 2008, 10:36am
13
It's the download, not the process
otheus
October 29, 2008, 10:37am
14
radoulov:
Here you go:
sar -x ALL 3 10
I believe this will give some idea ... (it will display paging per process activity, not directly I/O activity, but I believe they are related).
May be something like this?
sar -x ALL 2 10|awk \$4
That's pretty cool, but like you said, it's not IO activity but page faults. Since we're not swapping, this really doesn't tell us anything. But just to be sure, stopping the biggist pagefaulter had no effect on io load.
There is another nice method:
check man 5 proc for the columns that you want to monitor and run something like this:
awk '{print $1,":", $12}' /proc/[0-9]*/stat|sort -t: -k2rn
otheus:
That's pretty cool, but like you said, it's not IO activity but page faults. Since we're not swapping, this really doesn't tell us anything. But just to be sure, stopping the biggist pagefaulter had no effect on io load.
Hm ...,
you're right, you need a kernel patch for this
link
otheus
October 29, 2008, 11:01am
17
Ah crap... From collectl - Process Monitoring
Process I/O monitoring is limited to kernels that have that capability enabled and that didn't even appear before 2.6.22. If you don't see the file /proc/self/io, your kernel was not built with process I/O accounting enabled and you need to get one that has the following enabled: CONFIG_TASKSTATS, CONFIG_TASK_XACCT and CONFIG_TASK_IO_ACCOUNTING.
We're running 2.6.18.
...
I hate guessing, but I think it's your best bet now: what application(s) is/are running on this server?
otheus
October 29, 2008, 11:10am
19
Oh, just SAMBA, MySQL, mcast, proftpd, clam ... nothing much.
Ah ...
Did you try to stop mysql (if possible given your SLA, of course :))?