T4-1/Solaris 11 and sudden 50% performance drop

julumme · September 28, 2012, 4:23am

We have a SPARC T4-1 server, running Solaris 11, and it's doing some pretty extensive parsing on roughly 100GB data set.

All was well still few weeks ago, when I was testing the performance, I was reaching rougly 50minute calculation times, and it was more or less expected performance.
Now that I started moving the server to it's final location, when testing the tools, I noticed that performance has dropped dramatically. For some reason I can't parse the data set anymore under 2 hours.. I don't understand why.

I haven't done any modifications to the system, except some network configuration - and nobody else has touched the system to my knowledge.:wall:

Where should I start looking, why the parsing performance has dropped more than 50% ? Obviously it's quite HDD intensive, since our dataset is so large, but our zpool seems to be healthy, and without any errors.
I checked /var/adm/messages, and there were no errors either.

prstat doesn't show anything odd, only our parsing process (sed) is taking most of the time.

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  2096 username 9024K 1880K cpu55    0    0   2:26:36 1.6% sed/1
  2098 username 8960K 1808K cpu28    0    0   2:26:30 1.6% sed/1
  2097 username 9024K 1848K sleep   22    0   0:41:52 0.4% grep/1
  2095 username 9024K 1872K sleep   59    0   0:02:13 0.0% grep/1
  2099 username  132M  122M sleep   59    0   0:01:19 0.0% sort/1
   327 root        0K    0K sleep   99  -20   0:01:30 0.0% zpool-workarea/262
   621 root      702M  226M sleep   59    0   0:02:07 0.0% eptelemon/2
   539 root       17M   13M sleep   59    0   0:01:07 0.0% ldmd/16
    47 netcfg   4944K 2824K sleep   59    0   0:00:00 0.0% netcfgd/4
    81 daemon   7936K 5936K sleep   59    0   0:00:00 0.0% kcfd/3
   102 root     3136K 1128K sleep   59    0   0:00:00 0.0% in.mpathd/1
    44 root     5184K 4064K sleep   59    0   0:00:00 0.0% dlmgmtd/8
    74 netadm   5344K 3032K sleep   59    0   0:00:00 0.0% ipmgmtd/5
   116 root     2392K 1776K sleep   59    0   0:00:00 0.0% pfexecd/3
   793 root     3360K 1944K sleep   59    0   0:00:00 0.0% in.routed/1
 NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU                             
    11 username  244M  164M   0.5%   5:38:31 3.6%
    62 root     1047M  444M   1.3%   0:06:10 0.0%
     1 netcfg   4944K 2824K   0.0%   0:00:00 0.0%
     2 daemon     11M 6912K   0.0%   0:00:00 0.0%
     1 netadm   5344K 3032K   0.0%   0:00:00 0.0%
Total: 82 processes, 914 lwps, load averages: 2.63, 2.64, 2.64

I tried looking at iostat and vmstat, but I don't really understand them, any pointers on what I should look at ?
The output looks like this (during parsing), iostat:

   tty        sd1           sd2           sd3           sd4            cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   1   77 1664  14    4  1669  14    4  123  17   14  536   6    7    2  1  0 98
   0    0 3355  26    3  3303  26    3    0   0    0  1119  23    3    4  1  0 96
   0    0 3343  34    3  3414  31    3    0   0    0    0   0    0    4  1  0 96
   0    0 3328  26    2  3226  25    4    0   0    0    0   0    0    4  1  0 96
   0    0 2790  22    2  2893  23    3    0   0    0    0   0    0    4  1  0 96

vmstat:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 s4   in   sy   cs us sy id
 0 0 0 14679712 16787992 7 16 0 0  0  0  8 14 14 17  6 2114 3539 2189  2  1 98
 0 0 0 1991952 4236944 2  6  0  0  0  0  0 27 26  0  0 1988 4960 2104  4  1 96
 0 0 0 1957032 4202024 0  1  0  0  0  0  0 26 26  0 20 2029 4929 2157  4  1 96
 0 0 0 1923640 4168632 0  1  0  0  0  0  0 32 35  0  0 2029 4940 2163  4  1 96
 0 0 0 1893096 4138088 0  0  0  0  0  0  0 31 32  0  0 2001 4889 2115  4  1 96

I know that I should change our CPU threading mode to max-ipc, as it's currently throughput, but this hasn't changed since last time I run tests.

Thanks for any tips!

---------- Post updated 09-28-12 at 05:23 PM ---------- Previous update was 09-27-12 at 07:12 PM ----------

Ok, looks like I found the problem. The processing times are back to normal, after I changed the encoding back to C from UTF8.
For some reason it had become UTF-8, not sure who/what changed that - but narutally that made sed and grep work extra hard :rolleyes: