I have two tab delimited files as given below:
File_1:
PV16 E1 865 2814 1950
PV16 E2 2756 3853 1098
PV16 E4 3333 3620 288
PV16 E5 3850 4101 252
PV16 E6 83 559 477
PV16 E7 562 858 297
PV16 L2 4237 5658 1422
PV16 L1 5639 7156 1518
These two files are of unequal line counts.
File_2 :
1099 PV16 766 837
1099 PV16 904 975
1099 PV16 1022 1118
1099 PV17 1198 1269
1099 PV16 1438 1531
1099 PV16 1572 1643
1099 PV16 1952 2023
1099 PV16 2070 2211
1099 PV16 2322 2393
1099 PV16 2432 2503
1099 PV16 2545 2616
1099 PV16 2808 2884
1099 PV16 3050 3121
1099 PV16 3507 3693
1099 PV16 3696 3767
1099 PV16 4036 4107
1099 PV19 4246 4317
1099 PV20 4383 4454
1099 PV18 4673 4797
1099 PV16 4811 4882
Logic to compare & add the extra column as :
if (file_1.col1==file_2.col2 && file_2.col3>=file_1.col3 && file_2.col4<=file_1.col4)
then
print "file_2.col1,"\t",1 file_2.col2,"\t",file_2.col3,"\t",file_2.col4,"\t",file_1.col2,"\n";
else
print "file_2.col,"\t",1 file_2.col2,"\t",file_2.col3,"\t",file_2.col4,"\t","Null","\n";
panyam
March 18, 2013, 7:12am
2
Ok. What you have tried so far?
ctsgnb
March 18, 2013, 7:35am
3
Your logic doesn't say wether each line of file2 should be checked against every line of file1 or if we should process next line of file2 as soon as the condition it met.
I have tried
awk '{getline buf <"file_1";split(buf,a," ");if($2==a[1] && $2>=a[3] && $3<=a[4]) print $0,"\t",a[2],"\n";}' file_2
Yes ,all the lines of file 2 to be compared with file1
panyam
March 18, 2013, 8:18am
5
Sorry , I forgot to ask for. Pls post the output you are expecting.
I'm not getting any output
But the expected output should be like :
1099 PV16 766 837 E7 1099 PV16 904 975 E1
panyam
March 18, 2013, 8:59am
7
Still it's not clear.
You want to do a line by comparison from file1 to file2?
something like the below sudo code?
for line1 in file1
do
file line2 in file2
do
if file1.fields <your comparison stuff> file2.fields --success
then
print file2 fields , file1 2nd column
else
print just file2 fields
fi
done
done
?
ctsgnb
March 18, 2013, 12:35pm
9
Something like :
# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {print $0 OFS F[2]}
}}' OFS="\t" File_1 File_2
1099 PV16 766 837 E7
1099 PV16 904 975 E1
1099 PV16 1022 1118 E1
1099 PV16 1438 1531 E1
1099 PV16 1572 1643 E1
1099 PV16 1952 2023 E1
1099 PV16 2070 2211 E1
1099 PV16 2322 2393 E1
1099 PV16 2432 2503 E1
1099 PV16 2545 2616 E1
1099 PV16 2808 2884 E2
1099 PV16 3050 3121 E2
1099 PV16 3507 3693 E2
1099 PV16 3696 3767 E2
1099 PV16 4811 4882 L2
Or are you also willing to display the lines for which the conditions are never met ?
# awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{for(i=0;++i<=s;)
{e="Null";n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2]}
print $0 OFS e}
}' OFS="\t" File_1 File_2
1099 PV16 766 837 Null
1099 PV16 766 837 Null
1099 PV16 766 837 Null
1099 PV16 766 837 Null
1099 PV16 766 837 Null
1099 PV16 766 837 E7
1099 PV16 766 837 Null
1099 PV16 766 837 Null
1099 PV16 904 975 E1
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 904 975 Null
1099 PV16 1022 1118 E1
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV16 1022 1118 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV17 1198 1269 Null
1099 PV16 1438 1531 E1
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1438 1531 Null
1099 PV16 1572 1643 E1
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1572 1643 Null
1099 PV16 1952 2023 E1
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 1952 2023 Null
1099 PV16 2070 2211 E1
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2070 2211 Null
1099 PV16 2322 2393 E1
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2322 2393 Null
1099 PV16 2432 2503 E1
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2432 2503 Null
1099 PV16 2545 2616 E1
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2545 2616 Null
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 E2
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 Null
1099 PV16 2808 2884 Null
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 E2
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 Null
1099 PV16 3050 3121 Null
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 E2
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 Null
1099 PV16 3507 3693 Null
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 E2
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 Null
1099 PV16 3696 3767 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV16 4036 4107 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV19 4246 4317 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV20 4383 4454 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV18 4673 4797 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 Null
1099 PV16 4811 4882 L2
1099 PV16 4811 4882 Null
Just another piece of code you may wanna try in case you want all lines display at least once (those who never match will be display once an those who match several line from file1 will be displayed everytime it matches the condition.
So if we add for example a line "G2" like :
# head -2 File_1
PV16 E1 865 2814 1950
PV16 G2 865 2814 1950
The code
awk '{$1=$1}NR==FNR{s=NR;L[NR]=$0;next}{f=0;for(i=0;++i<=s;)
{e="Null";n=split(L,F," ");
if ((F[1]==$2)&&($3>=F[3])&&($4<=F[4])) {e=F[2];print $0 OFS e;f=1}
};if(!f) print $0 OFS e
}' OFS="\t" File_1 File_2
will give
1099 PV16 766 837 E7
1099 PV16 904 975 E1
1099 PV16 904 975 G2
1099 PV16 1022 1118 E1
1099 PV16 1022 1118 G2
1099 PV17 1198 1269 Null
1099 PV16 1438 1531 E1
1099 PV16 1438 1531 G2
1099 PV16 1572 1643 E1
1099 PV16 1572 1643 G2
1099 PV16 1952 2023 E1
1099 PV16 1952 2023 G2
1099 PV16 2070 2211 E1
1099 PV16 2070 2211 G2
1099 PV16 2322 2393 E1
1099 PV16 2322 2393 G2
1099 PV16 2432 2503 E1
1099 PV16 2432 2503 G2
1099 PV16 2545 2616 E1
1099 PV16 2545 2616 G2
1099 PV16 2808 2884 E2
1099 PV16 3050 3121 E2
1099 PV16 3507 3693 E2
1099 PV16 3696 3767 E2
1099 PV16 4036 4107 Null
1099 PV19 4246 4317 Null
1099 PV20 4383 4454 Null
1099 PV18 4673 4797 Null
1099 PV16 4811 4882 L2
1 Like
It worked as expected. Hey ctsgnb Thanks a lot :).
I'm very much interested to learn such multicolumn comparison tasks especially if there are multiple file and multiple columns. Please recommend some online tutorials links to understand in depth about awk implementation.
ctsgnb
March 19, 2013, 9:50am
11
If you are interested in learning awk :
You can read "Awk - A Tutorial and Introduction - by Bruce Barnett" (the PDF can be found for free quite easy over the web)
... and then... practice and practice !