awk & CPU Load

charbel · October 31, 2009, 9:07am

Deal All,

I'm writing a simple awk to generate some sort of report. The awk will check 24 files (file generated each one hour in a wholoe day) and then it will print one field to another file for counting purposes.

The script is working fine but the problem is that the CPU load is very high and almost reaching 100% when the script is running.

Is there a way to improve the CPU Load?

The machine I'm using is Sparc SUNW,Sun-Fire-V240 and the awk command I'm suing is:
cat $year$month$day*.TLG | awk 'BEGIN {FS=","} {if (($1)=="203" || ($1)=="204" || ($1)=="205" || ($1)=="206") print $3}' >> file.txt

danmero · October 31, 2009, 9:17am

Should work for you.

nice awk -F, '$1 > 202 && $1 < 207 {print $3}' $year$month$day*.TLG >> file.txt

---------- Post updated at 08:17 AM ---------- Previous update was at 08:17 AM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.

Use Code Tags when you post any code or data samples so others can easily read your code.
You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags and by hand.)
Avoid adding color or different fonts and font size to your posts.
Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums
Reply With Quote

Scott · October 31, 2009, 9:17am

Hi.

I can't imagine why that would max out a Sparc!

In any case it's best not to use awk on Solaris. Use nawk or /usr/xpg4/bin/awk

nawk -F, '$1 ~ /20[3-6]/ {print $3 > "file.txt" }' $year$month$day*.TLG

charbel · October 31, 2009, 9:58am

I have tried both options but unfortunately, the problem is still the same!

Scott · October 31, 2009, 10:12am

Are the variables $year, $month and $day actually set (i.e. otherwise you're generating a report based on possibly every *.TLG file)?

How big is each TLG file?

Are you saying that the CPU usage is low until you run the script?

Does the script finish at all? Is file.txt created? Does it contain the correct data?

charbel · October 31, 2009, 10:27am

Are the variables $year, $month and $day actually set (i.e. otherwise you're generating a report based on possibly every *.TLG file)?

Yes the variables are right. I have checked it again and for sure I am not searching in the whole *.TLG

How big is each TLG file?

The size of each file is not fixed, it depends on the traffic during that hour, sometimes rhe size is 100KB and sometimes it might be over 600KB.

Are you saying that the CPU usage is low until you run the script?

The CPU is fine until you start the script. Before starting the script, if you check CPU load through vmstat command, you will find the idle value varies between 85 & 96%, once the script is started, the idle value drops to 0

Does the script finish at all? Is file.txt created? Does it contain the correct data?

The script is generating the file successfully. One thing to be noted here, is that the same line of this script will have to run in 6 different directories, since I have different 26 files *.TLG generated in 6 different directories.

Scott · October 31, 2009, 10:38am

And the directories are locally mounted or are they network directories?

I honestly have to say I've never known awk to report data from 26 files (even 156 files) of the size you mention throttling a Solaris server.

Are there other I/O- (if not CPU-) intensive jobs running? Is there a problem with your filesystem?

More questions than answers, sorry!

charbel · October 31, 2009, 10:41am

I really appreciate your help....The filesystems are locally mounted and there is no problem in the filesystem.

Scott · October 31, 2009, 11:31am

Well it will use whatever CPU it can get to get the job done (unless you want it to take all day!). But that doesn't mean your system can't do anything else (especially if you're using the "nice" option from danmero)

charbel · November 1, 2009, 2:07pm

how about using for loop or splitting the files into smaller ones? do you think this might work?

Can you help me little with the for loop code as it will be my first time to use it.

Scott · November 1, 2009, 2:25pm

It's not the size of the files, it's awk using the CPU to do its job.

I created your scenario (6 directories with 26 500K files - enen though there's only 24 hours in a day!) and it took 23 seconds to complete. The CPU was maxed, but so what? The system didn't just grind to a halt.

You really have nothing to worry about. If something else needs CPU, it will still get it.