awk arrays comparing multiple columns across two files.

Hi, I'm trying to use awk arrays to compare values across two files based on multiple columns. I've attempted to load file 2 into an array and compare with values in file 1, but success has been absent. If anyone has any suggestions (and I'm not even sure if my script so far is on the right lines) it would be very much appreciated.

file1

A 1 10 20
B 1 35 50
C 2 40 50
D 2 65 100
E 3 10 30
F 3 20 40
G 4 25 50
H 4 45 70

file2

ID1 11 16 1
ID2 75 100 1
ID3 45 47 2
ID4 15 30 3
ID5 40 45 4
ID6 55 65 4

Essentially, if column 2 in file 1, and column 4 in file 2 match, continue.
Then, if column 2 in file 2 is >= column 3 in file 1
and column 3 in file 2 is <= column 4 in file 1
print out matching line of file 1, and column 1,2 and 3 from file2.

Desired output

A 1 10 20 ID1 11 16
C 2 40 50 ID3 45 47
E 3 10 30 ID4 15 30
G 4 25 50 ID5 40 45
H 4 45 70 ID6 55 65

Using a pseudo script based on similar problems online, I've got:

awk 'FNR == NR
{
f2[$0]++
next
}
{
for (i in f2)
{
split(i,f2_split)
if ((f2_split[4] == $2) && (f2_split[2] >= $3) && (f2_split[3] <= $4))
{print $0, f2_split[1],f2_split[2],f2_split[3]
}
}
}' file2 file1

This does a fantastic job of printing out the contents of file 2. Alas it's not what I was after. Any help would be much appreciated.

Try:

awk 'NR==FNR{a[NR]=$0;next}{for (i in a){split(a,x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1
1 Like

Thanks, that worked a treat. Had a little play around with that code, and realised one of the issues with my initial code was the use of new lines in inappropriate places. If I put a line between the awk 'NR==FNR and the rest of your code, that just prints out file 2 as well.

awk 'NR==FNR
{a[NR]=$0;next}{for (i in a){split(a,x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1

A valuable lesson that I'd previously overlooked. Thanks for your help.

You can put a newline like this:

awk 'NR==FNR{
a[NR]=$0;next}{for (i in a){split(a,x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1
1 Like

Awesome, that's really useful. A rookie mistake no doubt but one that I won't make again any time soon, hopefully! Thanks