Disk IO high. What to fix ?

solaris_1977 · September 30, 2013, 4:07am

This is Solaris-10 box. I can see disk I/O is high and performance is very very slow for applications running on it. What should I check to fix this issue ?

root@dbrpd01:/# iostat -xntz | head -7
   tty
 tin tout
   7   79
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   40.1   80.2  376.4  605.1  2.1  4.3   17.7   35.9   1  10 c1t0d0
   31.1   69.6  290.2  534.4  1.6  3.1   16.2   30.9   1   7 c1t1d0
root@dbrpd01:/# vmstat 5 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 sd   in   sy   cs us sy id
 18 0 0 71728216 21073504 1401 17657 4357 5 5 0 0 120 101 -0 -7505 14249 182183 23659 13 12 76
 23 30 0 95227096 37593168 73 5350 4187 2 2 0 0 386 154 0 0 11169 393110 17954 14 6 80
 2 30 0 95187504 37529632 302 5818 5619 2 2 0 0 403 299 0 12 11549 278131 18816 14 7 80
 13 31 0 95142896 37477960 142 5094 6013 0 0 0 0 381 357 0 4 13089 415174 37007 12 8 79
 9 33 0 95144816 37460264 760 13224 4736 0 0 0 0 394 204 0 0 12028 302935 31685 12 9 80
root@dbrpd01:/#
root@dbrpd01:/#
root@dbrpd01:/# sar -b
SunOS dbrpd01 5.10 Generic_144488-17 sun4u    09/30/2013
00:00:01 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
00:20:01       6    1459     100      36     186      81      70       0
00:40:01       4     666      99      33     139      76      62       0
01:00:01      21    1397      99      81     477      83      71       0
Average       10    1174      99      50     267      81      68       0
root@dbrpd01:/#
root@dbrpd01:/# iostat -En | grep Hard | head -2
c1t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
c1t1d0           Soft Errors: 0 Hard Errors: 2 Transport Errors: 2
root@dbrpd01:/#

While I do not see any major hardware errors in iostat. Its uptime is 190 days.

/# ps -ef | grep -i defunct | wc -l
      18

Above process are also not large in numbers, so it may not be contributing to slowness of applications.

Smiling_Dragon · September 30, 2013, 7:31pm

defunct processes seldom contribute to high IO wait times anyway. (Although they do suggest something is wonky, even Solaris doesn't usually have 18 of those things floating around after such a short uptime (half a year isn't long).

First identify if it's a tuning problem, a rubbish IO subsystem, or a rubbish application.
Is it so slow when the applications are not running? (ie bare OS, nothing else in memory)
Is it slow if you dd a few gigs of /dev/urandom out to a temp file on the disk?

If it's slow even without the app, see if the performance varies between the two disks, look for faults on the slower one if they differ a lot. Look for something wrong with the interface card if it's both the same.
If it's slow when the app runs, try to identify what process is spending all it's time in iowait and start there.
If these are not local disks (ie SAN perhaps), check your block sizes are sensible, exact multiples of each other, check for excessive contention on the fabric switch, yell at your SAN admin some - it's always their fault

Iftikhar_Barrie · October 10, 2013, 9:48am

Do you have a ORACLE db running on this server? sometimes a badly written SQL query can cause the slowdown especially if you have the table spaces configured on slower internal disks.

busi386 · October 22, 2013, 10:44pm

Are these your root disk? If so what volume management are you using?

os2mac · October 23, 2013, 3:34pm

try doing

vmstat 4

output:

kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr vc vc -- --   in   sy   cs us sy id
 0 0 0 10794904 3118992 96 1023 193 2 2 0 0 10 0  0  0  555 1430  556  2  5 93
 0 0 0 10717432 3083328 0 3 22  0  0  0  0  0  0  0  0  452  165  399  0  1 99

if any of the first two columns are incrementing you are running out of memory and writing to swap.

solaris_1977 · October 23, 2013, 3:40pm

I was able to fix it with a reboot. Reboot was scheduled for a different issue and post reboot, I do not see this issue anymore. Thanks for your inputs

jlliagre · October 23, 2013, 4:31pm

os2mac:

kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr vc vc -- --   in   sy   cs us sy id
 0 0 0 10794904 3118992 96 1023 193 2 2 0 0 10 0  0  0  555 1430  556  2  5 93
 0 0 0 10717432 3083328 0 3 22  0  0  0  0  0  0  0  0  452  165  399  0  1 99

if any of the first two columns are incrementing you are running out of memory and writing to swap.

Not really. The first two columns are the run and blocked thread queue, nothing related to swap issues. Better to monitor the third (wait) and twelfth (scan rate) ones if you suspect this.