Count occurances of X Y Z in a file in 1 go.

msullivan · March 2, 2009, 1:38am

Hi. I need to count multiple occurrences of X Y Z in a file in 1 go. At the moment I have the following scripts:
ssh readonly@$ServerIP 'YEAR=xx;DAY=xx;MONTH=xx;LMONTH=xx;for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 \
16 17 18 19 20 21 22 23; do cat /var/SP/log/cre/access.log_$YEAR$MONTH$DAY*_$i | grep -c "HTTP/1.1\" \"503";done'>>$sshEF
this goes for HTTP1.1 503... then there is 500, 400, 403 and 404 which runs the same thing..

Now I have to look for HTTP response codes in the hourly log files on an apache web server.. and count them.
At the moment my grep command runs through the log files once for each response type.

I have also tried
var_500=0;var_503=0;var_400=0;var_403=0;var_404=0
for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23;
do cat /var/SP/log/cre/access.log_$YEAR$MONTH$DAY*$i | while read line
do
variable=`echo $line | awk '{print$7}'` #print response code with awk
case $variable in #ERRORS 1
500 ) var_500=`expr $var_500 + 1`;;
503 ) var_503=`expr $var_503 + 1`;;
400 ) var_400=`expr $var_400 + 1`;;
403 ) var_403=`expr $var_403 + 1`;;
404 ) var_404=`expr $var_404 + 1`;;
* ) hello=hello
esac
datetime=$i
Server=`hostname`
date="$YEAR/$MONTH/$DAY"
for p in 500 503 400 403 404;
do
err_desc="HTTP/1.1 $p"
Value=`echo $(var$p)`
echo "vl$Servername,$date,$err_desc,$Value"
done
done

But the CPU utilization and time is too long to be a viable solution. ( 4min CPU time at over 3% utilization, whereas the grep -c is 10 seconds @ 0.1%)

Is there an easy way to count multiple things in a file in one go? Or should I just stick with grep -c?

Thanks in advance

ripat · March 2, 2009, 2:06am

This look like a typical problem for awk. I mean 100% awk. Can you post sample input file and desired output?

msullivan · March 2, 2009, 2:25am

This is sample input:
10.113.98.16 10.113.155.52 - - [02/Mar/2009:09:00:01 +0200] "GET /mi_icons/misc/png240/divider.png HTTP/1.1" 200 227 "http://live.vo
Dafone.com" "Mozilla/5.0 (SymbianOS/9.3; U; Series60/3.2 Samsung/I8510/XXHJ3; Profile/MIDP-2.1 Configuration
/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413 UP.Link/6.3.1.12.0" cookie:"$Version=1;JSESSIONID=947BCC46C9C2A6DF3460044
72E043C9E" "-" from:"41.26.19.165" 3g:"no" "27768552299" D:"1275" trusted:"-" x-up-from:"41.26.19.165"
10.113.98.16 10.113.155.52 - - [02/Mar/2009:09:00:01 +0200] "GET /img/ca/logo-red-bg.gif HTTP/1.1" 200 429 "http://owafe011.vodacomm
i.co.za/join.ravenriley.com/track/picdom;6279:RR:RR,0,0,0,/" "SonyEricssonW880i/R8BA Browser/NetFront/3.3 Profile/MIDP-2.0 Configura
tion/CLDC-1.1 UP.Link/6.3.1.12.0" cookie:"-" "-" from:"41.28.172.226" 3g:"no" "27825900717" D:"1129" trusted:"-" x-up-from:"41.28.17
2.226"

I highlighted the response codes.

Desired output will be something like:
HTTP/1.1" 500,<count value>
HTTP/1.1" 503,<count value>
HTTP/1.1" 400,<count value>
HTTP/1.1" 403,<count value>
HTTP/1.1" 404,<count value>

ripat · March 2, 2009, 2:29am

Just to put you on track, try this:

#! /bin/bash

awk '
{total[$9] += 1} 
END {
	for (i in total) 
	print i, total
}' /var/log/apache2/access.log /var/log/apache2/access.log.1

edit: If this doesn't produce the desired output try $10 instead of $9 as it seems that the HTTP response code in your sample file is on position 10

msullivan · March 2, 2009, 2:35am

Thanks I will have a look

msullivan · March 2, 2009, 2:39am

Superb!! Thanks ripat!
I goes to about 3.5% CPU Utilization, but finishes in about 15 seconds, which is fair game for me.

Output:
502 3
304 1359
503 88
404 467
200 31817
500 301
302 207

*Edit: But if there is another way to do it which uses very little CPU, please feel free to post a reply

msullivan · March 2, 2009, 3:17am

I changed the awk command to:
awk '{total[$10] += 1} END{for (i in total) print "HTTP/1.1 "i, total[i]}'
and output is:
HTTP/1.1 304 591
HTTP/1.1 503 24
HTTP/1.1 404 402
HTTP/1.1 200 21480
HTTP/1.1 500 5
HTTP/1.1 302 141

Is there a way I can pass a variable to the awk command so that the output will be:
var=00
00 HTTP/1.1 304 xxx
var=01
01 HTTP/1.1 304 xxx

*Edit: I changed the command to nawk and the result is a huge drop in CPU utilization, and quicker results.

ripat · March 2, 2009, 3:32am

Sure you can. Either by variable substitution inside awk code or by assigning the sell variable to a awk variable with the -v switch.
The GNU Awk User's Guide

Instead of looping through your log files and executing the awk code block on every loop, why don't you input all these files into awk like:

awk 'awk code block' file1 file2 file3

This would enable you to use the awk FILENAME variable to print the file name in front of the results.

awk '{total[FILENAME" "$10] += 1} END{for (i in total) print "HTTP/1.1 "i, total}' file1 file2 file3

msullivan · March 2, 2009, 6:18am

Thanks, I'll go have a look at the user's guide.
:D:D:D
Thanks you so very much ripat. I came right.

for t in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23; do echo "Day_hour: $t"; cat /var/SP/log/apache/cre/access.log_20090302_$t | nawk -v a=$t '{total[$10] += 1} END{for (i in total) print a" HTTP/1.1 "i, total[i]}';done
std out--->
Day_hour: 00
00 HTTP/1.1 304 1457
00 HTTP/1.1 404 504
00 HTTP/1.1 500 18
00 HTTP/1.1 503 46
00 HTTP/1.1 200 30510
00 HTTP/1.1 302 167
Day_hour: 01
01 HTTP/1.1 304 701
01 HTTP/1.1 404 227
01 HTTP/1.1 500 8
01 HTTP/1.1 503 11
01 HTTP/1.1 200 17281
01 HTTP/1.1 302 98

msullivan · March 2, 2009, 7:04am

As much fun as it is to see this work on a server, I am trying to figure out how to get his working in a ssh command.

ssh readonly@100.100.100.1 'for t in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23; do echo "Day_hour: $t"; cat /var/SP/log/apache/cre/access.log_20090302_$t | nawk -v a=$t '{total[$10] += 1} END{for (i in total) print a" HTTP/1.1 "i, total[i]}';done'

These scripts of mine works fine:
ssh readonly@$ServerIP 'YEAR=xx;DAY=xx;MONTH=xx;LMONTH=xx;for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 \
16 17 18 19 20 21 22 23; do cat /var/SP/log/cre/access.log_$YEAR$MONTH$DAY*_$i | grep -c "HTTP/1.1\" \"500";done'

but somehow I need to pass the whole "nawk..." string to ssh, anyone have any ideas?
I keep getting
bash: syntax error near unexpected token `(i'

ripat · March 2, 2009, 7:30am

If you don't want to do the infamous "escape dance", put your command in a file and do

ssh usr@host < command_file

Also, avoid the useless use of cat:

cat file_name | awk 'awk_code_block'

Try

awk 'awk_code_block' file_name

Useless use of cat

msullivan · March 2, 2009, 7:48am

I wish I could put it in a file... but I'm too lazy to do that....
but I did get rid of the cat though... and tada! some fancy escape work:

Servername=cre02 ; ServerIP=10.113.98.17;ssh readonly@$ServerIP 'for t in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23; do echo "Day_hour: $t"; nawk -v a=$t '\''{total[$10] += 1} END{for (i in total) print a,i, total[i]}'\'' /var/SP/log/apache/cre/access.log_20090302_$t;done'

What the poor single quotes did to deserve this treatment... I dunno

Thanks again ripat!!

rrk001 · March 30, 2009, 11:37am

you can use the -v argument

nawk -v str="01" '{total[$10] += 1} END{for (i in total) print str " HTTP/1.1 "i, total}'

rrk001 · March 30, 2009, 11:38am

sorry. too late with the reply