awk script to parse results from TWO files

I am trying to parse two files and get data that does not match in one of the columns ( column 3 in my case )

Data for two files are as follows

A.txt

abc   10    5   0  1  16
xyz    16    1   1  0  18  
efg    30    8   0   2  40
ijk      22    2   0   1  25

B.txt

abc   10    5   0  1  16
xyz    13    4   1  0  18  
efg    30    8   0   2  40
ijk      17    7   0   1  25

I am trying to get out put like following...if col3 value of B.txt is greater than col3 of A.txt then corresponding col1 and difference between the col3 values.

xyz 3 ---> basically col1R2 (B.col2R2 - A.col2R2)
ijk 5 ---> basically col1R4 (B.col2R4 - A.col2R4)

I tried the following to get the rows that differ..but I am getting all of the B.txt rows..

awk  'FILENAME=="A.txt" {arr[$0]++} FILENAME=="B.txt"  {if ($3 > arr[$3]) {print $1 "\t" $3}}' A.txt B.txt

I think if condition is not getting evaluated properly....any hint will be appreciated !!

  • Roger

I think that the following would produce the output you expect:

awk 'FILENAME == "A.txt" { arr[$1]=$3 } FILENAME == "B.txt" && arr[$1] != $3 { print $1 ,$3 - arr[$1] }' A.txt B.txt

What I couldn't understand from your code is this statement:

arr[$0]++

$0 contains the entire current record (current line). What is your idea here?

1 Like
awk 'NR==FNR{a[$1]=$3;next}$3>a[$1]{print $1,$3-a[$1]}' A B
xyz 3
ijk 5
1 Like

Thanks yinyuemi and pflynn !! Both solutions work !!

pflynn , Yes you are right, I was doing wrong in defining the array....I thought it will yield me just the col3 value when I compare...but I suppose it was taking whole row and trying to compare...

do you guys explain this part of code , { arr[$1]=$3 } ?

Hi roger67,

arr[$1]=$3 

means to build up an array named by "arr", indexed as $1, its corresponding value is $3.

Best,

Y

1 Like
arr[$1]=$3

What we are doing here is creating an array, whose indexes are the contents of the first column of each line ("abc", "xyz", etc), and the values are the corresponding third element of each line. For example, the first element of the array is arr[abc], whose value is 5. Notice that this operation is done when we are reading file A.txt. After we are done reading A.txt, we start reading file B.txt, then we can use column 1 (element $1) of each line as index to retrieve back the corresponding third column of file A.txt from the array, and compare it to the current element from the trhird column ($3). If they are different, we print them.

arr[$1] != $3
1 Like

Thanks guys !!