Trying to use awk
to:
update $2
in file2
with the $2
value in file1
, if $1
in file1
matches $13
in file2
, which is tab-delimeted
. The $2
values may already be the same so in that case nothing happens and the next line is processed.
There are exactly 4,605
unique $13
values. Thank you :).
file1
MECP2 NM_004992.3
CBLB NM_170662.3
file2
173 NM_170662.3 chr3 - 105377108 105587887 105377813 105586421 19 105377108,105389076,105397274,105400322,105400567,105404163,105412337,105420937,105422831,105438890,105452852,105456014,105459337,105464760,105470305,105495239,105572257,105586253,105587579, 105378073,105389196,105397415,105400454,105400662,105404310,105412432,105421303,105423017,105439094,105452984,105456102,105459475,105464882,105470462,105495386,105572508,105586435,105587887, 0 CBLB cmpl cmpl 1,1,1,1,2,2,0,0,0,0,0,2,2,0,2,2,0,0,-1,
219 NM_001110792.1 chrX - 153295685 153363188 153295817 153363122 3 153295685,153297657,153363060, 153296901,153298008,153363188, 0 MECP2 cmpl cmpl 2,2,0,
desired output
173 NM_170662.3 chr3 - 105377108 105587887 105377813 105586421 19 105377108,105389076,105397274,105400322,105400567,105404163,105412337,105420937,105422831,105438890,105452852,105456014,105459337,105464760,105470305,105495239,105572257,105586253,105587579, 105378073,105389196,105397415,105400454,105400662,105404310,105412432,105421303,105423017,105439094,105452984,105456102,105459475,105464882,105470462,105495386,105572508,105586435,105587887, 0 CBLB cmpl cmpl 1,1,1,1,2,2,0,0,0,0,0,2,2,0,2,2,0,0,-1,
219 NM_004992.3 chrX - 153295685 153363188 153295817 153363122 3 153295685,153297657,153363060, 153296901,153298008,153363188, 0 MECP2 cmpl cmpl 2,2,0,
awk
awk 'FNR==NR { a[$1]=$13; next } { if(a[$2]){$2=a[$2] }; print }' OFS="\t" file1 file2 > out