The below awk
in bold will look for the ids in file1
in $2
of file2
and if they match print the line in file2
. If an id is missing or not found in file2
(like BMPR2 in line4 of file1) I can not figure out how to add it to them to the lines in the output as missing in $3
following the same format. That is with the next sequential number in $1
, the id from file1
in $2
, and the word missing in $3
. My attempt at doing this is the modified awk
which does execute but the output is all of
file2
not the desired output and I am not sure why? There may be multiple lines that are missing in my actual data but the files are always the same format as below. Thank you :).
file1 tab-delimeted
ABCA3
ACVRL1
BMPR1B
BMPR2
CAV1
file2 tab-delimeted
20 ABCA3 100.00
101 ACVRL1 100.00
596 BMPR1B 100.00
597 BMPR3 100.00
733 CAV1 100.00
734 CAV3 100.00
735 CBFB 100.00
736 CBL 100.00
737 CBLB 100.00
738 CBR1 100.00
awk
awk -F'\t' 'NR==FNR{A[$1];next}$2 in A' file1 file2
output of command BMPR2 is not found so it is not printed
20 ABCA3 100.00
101 ACVRL1 100.00
596 BMPR1B 100.00
733 CAV1 100.00
modified awk
awk -F'\t' 'NR==FNR{a[$1]; next}$2 in a{delete a[$2]}
END{for(i in a) print ++FNR,i,"missing"}1' file1 OFS='\t' file2
desired output tab-delimeted
1 ABCA3 100.00
2 ACVRL1 100.00
3 BMPR1B 100.00
4 CAV1 100.00
5 BMPR2 missing