I'm running on Solaris 10, and I have a script that's running on several machines. Basically, what it's doing is:
- tail -f | grep one or more log files into a temp file
- Every minute or so, copy that temp file to a second temp and zero the first
- Sed through the 2nd temp to pull out a user ID
- grep through the file for occurences of that ID with some other text (the other text is in a local file that I read from in a loop)
- Output a text record into a log file that basically consists of the User, that text, and a count.
Another server then does a tail -f of the output log file.
The user cpu on this script is small... on the order of 2-4% depending on what box I'm running it on. But the crazy thing is that on my test box, with one input data stream, as soon as I start running the script, my kernal % jumps from like 1% to 60%.
As far as I can see, the issue is NOT with disk i/o wait, or with memory (still half free and very very small paging values).
# sar -g 5 5
SunOS ssdev01 5.10 138889-03 i86pc 03/03/2010
13:09:01 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
13:09:06 0.20 0.80 0.60 0.00 0.00
13:09:11 0.40 1.00 0.80 0.00 0.00
13:09:16 0.20 0.80 0.80 0.00 0.00
13:09:21 0.20 0.80 0.60 0.00 0.00
13:09:26 0.60 1.80 1.40 0.00 0.00
Average 0.32 1.04 0.84 0.00 0.00
Pre-run top:
load averages: 0.05, 0.27, 0.66 13:25:39
69 processes: 68 sleeping, 1 on cpu
CPU states: 96.7% idle, 2.5% user, 0.8% kernel, 0.0% iowait, 0.0% swap
Memory: 2048M real, 1126M free, 486M swap in use, 2652M swap free
During run:
load averages: 0.68, 0.39, 0.68 13:26:19
78 processes: 75 sleeping, 1 running, 1 zombie, 1 on cpu
CPU states: 20.5% idle, 18.3% user, 61.2% kernel, 0.0% iowait, 0.0% swap
Memory: 2048M real, 1123M free, 491M swap in use, 2647M swap free
here's the script in ps:
# /usr/ucb/ps -aux |more
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
ops 15436 4.7 0.1 1524 1028 ? S 13:25:51 0:06 /usr/bin/ksh /sscp
Any ideas what I can look at to see what's chewing all the kernel?