counting lines containing two column field values with awk

origamisven · June 22, 2011, 5:52am

Hello everybody,
I'm trying to count the number of consecutive lines in a text file which have two distinctive column field values. These lines may appear in several line blocks within the file, but I only want a single block to be counted.

This was my first approach to tackle the problem (I'm a beginner, so be gentle :D) ...

# find line in which pattern INT appears for the first time in file topology

NR_1st_int=$(awk -v var="$INT" '$0~var {print NR}' topology | sed '1 !d')

# print molecule number of field 5 in that line into MOL

MOL_int=$(awk -v line_int="$NR_1st_int" 'NR==line_int {print $5}' topology)

# count number of lines that INT and MOL appear IN THE FILE

NINT=$(awk -v var1="$INT" -v var2="$MOL_int" 'BEGIN { count=0 } { if (( $4 == var1 ) && ( $5 == var2 )) count++} END{print count}' topology)

The main problem is that this (bad) code does not restrict the number NINT to a single block of occurring lines, which I need.

I've tried working with some loops but I messed it up every time. Can you help me out?

panyam · June 22, 2011, 6:03am

Can you please post sample input and expected out , since I believe the "sample data" explains the problem better.

panyam · June 22, 2011, 6:57am

so the output you are expecting is

ABC -> 3 times,

SOL -> 3 times ?

In straight way , count only the first occurance of SOL ( keeping in mind column1 should be unique)

yazu · June 22, 2011, 7:41am

% VAR=SOL;  awk '/'$VAR'/ { while (match($0, "'$VAR'")) 
                { s = s $0 "\n";  getline } printf s; exit}' testfile | wc -l
3

yazu · June 22, 2011, 11:22am

s (undefined by default or equal "") concatenated with the current line
and newline char. Then getline reads the next line from input and makes it
the current line ($0). This lasts until we have matches with values of $VAR.

I don't quite understand your task but maybe this can help:

for ATOM in SOL GLN ; do
{
        awk '/'$ATOM'/ { while (match($0, "'$ATOM'"))
                { s = s $0 "\n";  getline } printf s; exit}' | wc -l
} < testfile
done
12
6

where testfile is the file with your data.

Sorry for my English.