We have a SPARC T4-1 server, running Solaris 11, and it's doing some pretty extensive parsing on roughly 100GB data set.
All was well still few weeks ago, when I was testing the performance, I was reaching rougly 50minute calculation times, and it was more or less expected performance.
Now that I started moving the server to it's final location, when testing the tools, I noticed that performance has dropped dramatically. For some reason I can't parse the data set anymore under 2 hours.. I don't understand why.
I haven't done any modifications to the system, except some network configuration - and nobody else has touched the system to my knowledge.:wall:
Where should I start looking, why the parsing performance has dropped more than 50% ? Obviously it's quite HDD intensive, since our dataset is so large, but our zpool seems to be healthy, and without any errors.
I checked /var/adm/messages, and there were no errors either.
prstat doesn't show anything odd, only our parsing process (sed) is taking most of the time.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
2096 username 9024K 1880K cpu55 0 0 2:26:36 1.6% sed/1
2098 username 8960K 1808K cpu28 0 0 2:26:30 1.6% sed/1
2097 username 9024K 1848K sleep 22 0 0:41:52 0.4% grep/1
2095 username 9024K 1872K sleep 59 0 0:02:13 0.0% grep/1
2099 username 132M 122M sleep 59 0 0:01:19 0.0% sort/1
327 root 0K 0K sleep 99 -20 0:01:30 0.0% zpool-workarea/262
621 root 702M 226M sleep 59 0 0:02:07 0.0% eptelemon/2
539 root 17M 13M sleep 59 0 0:01:07 0.0% ldmd/16
47 netcfg 4944K 2824K sleep 59 0 0:00:00 0.0% netcfgd/4
81 daemon 7936K 5936K sleep 59 0 0:00:00 0.0% kcfd/3
102 root 3136K 1128K sleep 59 0 0:00:00 0.0% in.mpathd/1
44 root 5184K 4064K sleep 59 0 0:00:00 0.0% dlmgmtd/8
74 netadm 5344K 3032K sleep 59 0 0:00:00 0.0% ipmgmtd/5
116 root 2392K 1776K sleep 59 0 0:00:00 0.0% pfexecd/3
793 root 3360K 1944K sleep 59 0 0:00:00 0.0% in.routed/1
NPROC USERNAME SWAP RSS MEMORY TIME CPU
11 username 244M 164M 0.5% 5:38:31 3.6%
62 root 1047M 444M 1.3% 0:06:10 0.0%
1 netcfg 4944K 2824K 0.0% 0:00:00 0.0%
2 daemon 11M 6912K 0.0% 0:00:00 0.0%
1 netadm 5344K 3032K 0.0% 0:00:00 0.0%
Total: 82 processes, 914 lwps, load averages: 2.63, 2.64, 2.64
I tried looking at iostat and vmstat, but I don't really understand them, any pointers on what I should look at ?
The output looks like this (during parsing), iostat:
tty sd1 sd2 sd3 sd4 cpu
tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id
1 77 1664 14 4 1669 14 4 123 17 14 536 6 7 2 1 0 98
0 0 3355 26 3 3303 26 3 0 0 0 1119 23 3 4 1 0 96
0 0 3343 34 3 3414 31 3 0 0 0 0 0 0 4 1 0 96
0 0 3328 26 2 3226 25 4 0 0 0 0 0 0 4 1 0 96
0 0 2790 22 2 2893 23 3 0 0 0 0 0 0 4 1 0 96
vmstat:
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s1 s2 s3 s4 in sy cs us sy id
0 0 0 14679712 16787992 7 16 0 0 0 0 8 14 14 17 6 2114 3539 2189 2 1 98
0 0 0 1991952 4236944 2 6 0 0 0 0 0 27 26 0 0 1988 4960 2104 4 1 96
0 0 0 1957032 4202024 0 1 0 0 0 0 0 26 26 0 20 2029 4929 2157 4 1 96
0 0 0 1923640 4168632 0 1 0 0 0 0 0 32 35 0 0 2029 4940 2163 4 1 96
0 0 0 1893096 4138088 0 0 0 0 0 0 0 31 32 0 0 2001 4889 2115 4 1 96
I know that I should change our CPU threading mode to max-ipc, as it's currently throughput, but this hasn't changed since last time I run tests.
Thanks for any tips!
---------- Post updated 09-28-12 at 05:23 PM ---------- Previous update was 09-27-12 at 07:12 PM ----------
Ok, looks like I found the problem. The processing times are back to normal, after I changed the encoding back to C from UTF8.
For some reason it had become UTF-8, not sure who/what changed that - but narutally that made sed and grep work extra hard :rolleyes: