Multiple pattern matching using awk and getting count of lines

aemunathan · April 17, 2012, 6:42pm

Hi ,

I have a file which has multiple rows of data, i want to match the pattern for two columns and if both conditions satisfied i have to add the counter by 1 and finally print the count value. How to proceed...

I tried in this way...

awk -F, 'BEGIN {cnt = 0} {if $6 == "VLY278" && substr($3,1,10) ~ /2012-04-15/  {cnt=cnt+1} } END {print cnt}' /tmp/p13_1.txt

but its not successful.
sample data is

Chubler_XL · April 17, 2012, 7:32pm

Try this:

if requires () around expression so correct syntax for your solution.

awk -F, 'BEGIN {cnt = 0} {if($6 == "VLY278" && substr($3,1,10) ~ /2012-04-15/)  {cnt=cnt+1} } END {print cnt}' /tmp/p13_1.txt

Below is optimised version (date in data is 14 not 15)

awk -F, '$6=="VLY278" && $3 ~ /^2012-04-14*/ {cnt++} END {print cnt}' infile

aemunathan · April 17, 2012, 7:35pm

hi

Thanks for the response....
can i use or condition as well in that

like

awk -F, '$6=="VLY278" || $6 == "VLY280" || $6 == "VLY366" || $6 == "TLY340" && $3 ~ /^2012-04-14*/ {cnt++} END {print cnt}' infile

Chubler_XL · April 17, 2012, 7:40pm

Yes, but it's probably best to use brackets around the or conditions to avoid confusion with operator precedence:

awk -F, '($6=="VLY278" || $6 == "VLY280" || $6 == "VLY366" || $6 == "TLY340") && $3 ~ /^2012-04-14*/ {cnt++} END {print cnt}' infile

or

awk -F, '$6 ~ /VLY278|VLY280|VLY366|TLY340/ && $3 ~ /^2012-04-14*/ {cnt++} END {print cnt}' infile

aemunathan · April 17, 2012, 7:50pm

HI

The ouput contains very huge count value as compared to the actually expected output....

What could be the problem?

Chubler_XL · April 17, 2012, 8:14pm

Why not change cnt++ to print , it will then output each line that matches and this should help work out what it's matching.

You can also pipe this output to wc -l to confirm the count of matching lines.

neutronscott · April 17, 2012, 8:50pm

This part is wrong:

$3 ~ /^2012-04-14*/

You are using regex, not a shell glob. That would match 2012-04-1, since the 4 may be matched 0+ times. Drop the asterisks. or replace it with a space.