Join not working for comparision

Hi All,
I have 2 files where the first column of both the files have to be compared and if they match the first six columns of the first file to be extracted in the output file.

Format of files :

File1 : ${SHTEMP}NPBR5.XTR.tmp

S00016678|129|7|MPF|20090106|E
S00016678|129|7|MPF|20090205|E
S00016678|129|7|MPF|20090305|E
S00016678|129|7|MPF|20090402|E
S00016678|129|7|MPF|20090504|E
S00016678|129|7|MPF|20090602|E
S00016678|129|7|MPF|20090702|E
S00016678|129|7|MPF|20090804|E
S00016678|129|7|MPF|20090902|E
S00016678|129|7|MPF|20091006|E
S00016678|129|7|MPF|20091103|E
S00016678|129|7|MPF|20091203|E
S00017768|45|7|MPF|20081208|E
S00017768|45|7|MPF|20090113|E
S00017768|45|7|MPF|20090218|E
S00017768|90|7|MPF|20090309|E
S00017768|45|7|MPF|20090506|E
S00017768|45|7|MPF|20090605|E
S00017768|45|7|MPF|20090708|E
S00017768|45|3|MPF|20090811|E
S00017768|-45|3|MPF|20090811|E
S00017768|45|7|MPF|20090812|E
S00017768|45|7|MPF|20090915|E
S00017768|45|7|MPF|20091005|E
S00017768|45|7|MPF|20091104|E
S00017768|45|7|MPF|20091210|E
S00049990|258|7|MPF|20090408|E
S00049990|129|7|MPF|20090513|E
S00049990|129|7|MPF|20090612|E
S00049990|129|7|MPF|20090710|E
S00049990|129|7|MPF|20090812|E
S00049990|129|7|MPF|20090909|E
S00049990|129|7|MPF|20091009|E
S00049990|129|7|MPF|20091110|E
S00049990|129|7|MPF|20091210|E
S00059081|138|7|MPF|20080311|E
S00059081|133|7|MPF|20080331|E
S00059081|133|7|MPF|20081211|E
S00059081|128|7|MPF|20090106|E
S00059081|5|7|MPF|20090203|E
S00059081|123|7|MPF|20090203|E
S00059081|5|7|MPF|20090310|E
S00059081|60|7|MPF|20090310|E
S00059081|128|3|MPF|20090421|E
S00059081|5|3|MPF|20090421|E
S00059081|68|7|MPF|20090407|E

File 2 :${SHTEMP}NPBR1.XTR.final

S00016678|MIDDL|0|MR|019221521A||RL|STD|0|0||E
S00017768|ESSEX|0|MR|018163650A||R|STD|0|0||E
S00059081|PLYMO|0|MR|024266906A||VR|STD|0|0||E
S00001153|MIDDL|0|MR|018099911D||P|STD|0|0||E
S00001156|NORFO|0|MR|010225966A||P|STD|0|0||E
S00001167|MIDDL|0|MR|014143292A||P|STD|0|0||E
S00001263|MIDDL|0|MR|020105100A||P3|STD|0|0||E
S00001283|MIDDL|0|MR|021203506A||P|STD|0|0||E
S00001309|MIDDL|0|MR|010261774A||P|STD|0|0||E
S00001321|MIDDL|0|MR|020206784A||P|STD|0|0||E
S00001382|MIDDL|0|MR|025206271A||P|STD|0|0||E
S00001401|MIDDL|0|MR|030243897A||P|STD|0|0||E
S00001416|MIDDL|0|MR|029071547A||P|STD|0|0||E
S00001423|MIDDL|0|MR|027120380D||R3|STD|0|0||E
S00001441|1044|0|MR|002224916A||U|EG GIC SRX|0|0||E
S00001460|NORFO|0|MR|021208553D||RK|STD|0|0||E
S00001485|1143|0|MR|029127847D||U|EG SRX|0|0||E
S00001493|ESSEX|0|MR|028205925D||RK|STD|0|0||E

Code tried :

join -t"|" -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 1.6 ${SHTEMP}NPBR5.XTR.tmp ${SHTEMP}NPBR1.XTR.final > ${SHTEMP}NPBR5.XTR.tmp1

Is there a way in awk or sed to do this.

awk -F\| 'NR==FNR{A[$1]=$0;next}$1 in A{print A[$1]}' ${SHTEMP}NPBR5.XTR.tmp ${SHTEMP}NPBR1.XTR.final
1 Like

Thanks! Is there a way to include field 3 from file 2 in the output file.

Existing code :

join -t"|" -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 1.6 2.3 ${SHTEMP}NPBR5.XTR.tmp ${SHTEMP}NPBR1.XTR.final > ${SHTEMP}NPBR5.XTR.tmp1
awk -F\| 'NR==FNR{A[$1]=$0;next}$1 in A{print A[$1],$3}' OFS=\| ${SHTEMP}NPBR5.XTR.tmp ${SHTEMP}NPBR1.XTR.final
1 Like