Compare files & extract column awk

I have two tab delimited files as given below:

File_1:
PV16    E1    865    2814    1950
PV16    E2    2756    3853    1098
PV16    E4    3333    3620    288
PV16    E5    3850    4101    252
PV16    E6    83    559    477
PV16    E7    562    858    297
PV16    L2    4237    5658    1422
PV16    L1    5639    7156    1518

These two files are of unequal line counts.

File_2 :
1099    PV16    766    837    
1099    PV16    904    975    
1099    PV16    1022    1118    
1099    PV17    1198    1269    
1099    PV16    1438    1531    
1099    PV16    1572    1643    
1099    PV16    1952    2023    
1099    PV16    2070    2211    
1099    PV16    2322    2393    
1099    PV16    2432    2503    
1099    PV16    2545    2616    
1099    PV16    2808    2884    
1099    PV16    3050    3121    
1099    PV16    3507    3693    
1099    PV16    3696    3767    
1099    PV16    4036    4107    
1099    PV19    4246    4317    
1099    PV20    4383    4454    
1099    PV18    4673    4797    
1099    PV16    4811    4882    

Logic to compare & add the extra column as :

if (file_1.col1==file_2.col2 && file_2.col3>=file_1.col3 && file_2.col4<=file_1.col4) 
then 
print "file_2.col1,"\t",1 file_2.col2,"\t",file_2.col3,"\t",file_2.col4,"\t",file_1.col2,"\n";
else
print "file_2.col,"\t",1 file_2.col2,"\t",file_2.col3,"\t",file_2.col4,"\t","Null","\n";

Ok. What you have tried so far?

Your logic doesn't say wether each line of file2 should be checked against every line of file1 or if we should process next line of file2 as soon as the condition it met.

I have tried 
awk '{getline buf <"file_1";split(buf,a," ");if($2==a[1] && $2>=a[3] && $3<=a[4]) print $0,"\t",a[2],"\n";}' file_2

Yes ,all the lines of file 2 to be compared with file1

Sorry , I forgot to ask for. Pls post the output you are expecting.

I'm not getting any output

But the expected output should be like :

1099    PV16    766    837  E7       1099    PV16    904    975  E1

Still it's not clear.

You want to do a line by comparison from file1 to file2?

something like the below sudo code?

for line1 in file1
do

file line2 in file2
do

if file1.fields <your comparison stuff> file2.fields --success
then
print file2 fields , file1 2nd column
else
print just file2 fields
fi

done

done

?

yes similar kind

Something like :

# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {print $0 OFS F[2]}
}}' OFS="\t" File_1 File_2
1099    PV16    766     837     E7
1099    PV16    904     975     E1
1099    PV16    1022    1118    E1
1099    PV16    1438    1531    E1
1099    PV16    1572    1643    E1
1099    PV16    1952    2023    E1
1099    PV16    2070    2211    E1
1099    PV16    2322    2393    E1
1099    PV16    2432    2503    E1
1099    PV16    2545    2616    E1
1099    PV16    2808    2884    E2
1099    PV16    3050    3121    E2
1099    PV16    3507    3693    E2
1099    PV16    3696    3767    E2
1099    PV16    4811    4882    L2

Or are you also willing to display the lines for which the conditions are never met ?

# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{e="Null";n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2]}
print $0 OFS e}
}' OFS="\t" File_1 File_2
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    766     837     E7
1099    PV16    766     837     Null
1099    PV16    766     837     Null
1099    PV16    904     975     E1
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    904     975     Null
1099    PV16    1022    1118    E1
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV16    1022    1118    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV17    1198    1269    Null
1099    PV16    1438    1531    E1
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1438    1531    Null
1099    PV16    1572    1643    E1
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1572    1643    Null
1099    PV16    1952    2023    E1
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    1952    2023    Null
1099    PV16    2070    2211    E1
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2070    2211    Null
1099    PV16    2322    2393    E1
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2322    2393    Null
1099    PV16    2432    2503    E1
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2432    2503    Null
1099    PV16    2545    2616    E1
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2545    2616    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    E2
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    2808    2884    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    E2
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3050    3121    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    E2
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3507    3693    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    E2
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    3696    3767    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV16    4036    4107    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV19    4246    4317    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV20    4383    4454    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV18    4673    4797    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    Null
1099    PV16    4811    4882    L2
1099    PV16    4811    4882    Null

Just another piece of code you may wanna try in case you want all lines display at least once (those who never match will be display once an those who match several line from file1 will be displayed everytime it matches the condition.

So if we add for example a line "G2" like :

# head -2 File_1
PV16    E1    865    2814    1950
PV16    G2    865    2814    1950

The code

awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{f=0;for(i=0;++i<=s;)
{e="Null";n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2];print $0 OFS e;f=1}
};if(!f) print $0 OFS e
}' OFS="\t" File_1 File_2

will give

1099    PV16    766     837     E7
1099    PV16    904     975     E1
1099    PV16    904     975     G2
1099    PV16    1022    1118    E1
1099    PV16    1022    1118    G2
1099    PV17    1198    1269    Null
1099    PV16    1438    1531    E1
1099    PV16    1438    1531    G2
1099    PV16    1572    1643    E1
1099    PV16    1572    1643    G2
1099    PV16    1952    2023    E1
1099    PV16    1952    2023    G2
1099    PV16    2070    2211    E1
1099    PV16    2070    2211    G2
1099    PV16    2322    2393    E1
1099    PV16    2322    2393    G2
1099    PV16    2432    2503    E1
1099    PV16    2432    2503    G2
1099    PV16    2545    2616    E1
1099    PV16    2545    2616    G2
1099    PV16    2808    2884    E2
1099    PV16    3050    3121    E2
1099    PV16    3507    3693    E2
1099    PV16    3696    3767    E2
1099    PV16    4036    4107    Null
1099    PV19    4246    4317    Null
1099    PV20    4383    4454    Null
1099    PV18    4673    4797    Null
1099    PV16    4811    4882    L2
1 Like

:b: It worked as expected. Hey ctsgnb Thanks a lot :).
I'm very much interested to learn such multicolumn comparison tasks especially if there are multiple file and multiple columns. Please recommend some online tutorials links to understand in depth about awk implementation.

If you are interested in learning awk :

You can read "Awk - A Tutorial and Introduction - by Bruce Barnett" (the PDF can be found for free quite easy over the web)

... and then... practice and practice !