Computing the ratio of similar columns in the two files using awk script

coder83 · February 13, 2012, 10:33am

Thanks Bartus11 for your help in the following code to compare the two files "t1" and "t2".

awk 'NR==FNR{a[$2]=1;next}$2 in a{print $2}' t1 t2

First can anyone explain that what is the purpose of assigning a[$2] =1?

Second, the current script is printing out the matched columns between the first and second file "t1" and "t2" but I want to print out only the ratio of matched columns. In other words, if only on column in matched between �t1� and �t2� then it should print out �1� instead value of the column "real_name".

Can anyone please suggest what kind of amendment is required in the above code to achieve the desired output?

Input: t1

7 real_name
     8 pa_name
     9 make_server_info_pw
     9 passon
    11 mapped_name
    11 nt_status
    13 passon
    15 p
    17 server_info
    18 p

Input t2:

1 CHECK_DECLS   
1 True   
1 conf   
1 headers   
1 reverse   
1 real_name

Current output:

real_name

Desired output:

Chubler_XL · February 13, 2012, 11:20pm

There is no purpose of assigning a[$2]=1 as any reference to a[$2] will create the array entry. The following will work just as well (only difference is that the array will contain blank entries for each element):

awk 'NR==FNR{a[$2];next}$2 in a{print $2}' t1 t2

The following will display count of number of matches between two files:

awk 'NR==FNR{a[$2];next}$2 in a{b[$2]}END{print length(b)}' t1 t2

Only populates b[] if $2 from t2 was inserted into a[] during processing of file t1.
Length of b is total number of elements in b[] (i.e. count of matches).

coder83 · February 14, 2012, 6:02am

Thanks Chubler XL

The script is not working, and gives the following error message.

illegal reference to array b

Chubler_XL · February 14, 2012, 11:45am

You could try gawk if you have it or as you only need the count just sum them as you go:

awk 'NR==FNR{a[$2];next}$2 in a{c++}END{print c}' t1 t2

coder83 · February 15, 2012, 5:59am

Thanks Chubler XL,

It worked