Computing the ratio of similar columns in the two files using awk script

Thanks Bartus11 for your help in the following code to compare the two files "t1" and "t2".

awk 'NR==FNR{a[$2]=1;next}$2 in a{print $2}' t1 t2

First can anyone explain that what is the purpose of assigning a[$2] =1?

Second, the current script is printing out the matched columns between the first and second file "t1" and "t2" but I want to print out only the ratio of matched columns. In other words, if only on column in matched between �t1� and �t2� then it should print out �1� instead value of the column "real_name".

Can anyone please suggest what kind of amendment is required in the above code to achieve the desired output?

Input: t1

7 real_name
     8 pa_name
     9 make_server_info_pw
     9 passon
    11 mapped_name
    11 nt_status
    13 passon
    15 p
    17 server_info
    18 p

Input t2:

1 CHECK_DECLS   
1 True   
1 conf   
1 headers   
1 reverse   
1 real_name
  

Current output:

real_name

Desired output:

1

There is no purpose of assigning a[$2]=1 as any reference to a[$2] will create the array entry. The following will work just as well (only difference is that the array will contain blank entries for each element):

awk 'NR==FNR{a[$2];next}$2 in a{print $2}' t1 t2

The following will display count of number of matches between two files:

awk 'NR==FNR{a[$2];next}$2 in a{b[$2]}END{print length(b)}' t1 t2

Only populates b[] if $2 from t2 was inserted into a[] during processing of file t1.
Length of b is total number of elements in b[] (i.e. count of matches).

1 Like

Thanks Chubler XL

The script is not working, and gives the following error message.

illegal reference to array b

You could try gawk if you have it or as you only need the count just sum them as you go:

awk 'NR==FNR{a[$2];next}$2 in a{c++}END{print c}' t1 t2
1 Like

Thanks Chubler XL,

It worked :slight_smile: