awk to update field in file2 if not the same as file1

Trying to use awk to:

update $2 in file2 with the $2 value in file1 , if $1 in file1 matches $13 in file2 , which is tab-delimeted . The $2 values may already be the same so in that case nothing happens and the next line is processed.

There are exactly 4,605 unique $13 values. Thank you :).

file1

MECP2 NM_004992.3
CBLB NM_170662.3

file2

173    NM_170662.3    chr3    -    105377108    105587887    105377813    105586421    19    105377108,105389076,105397274,105400322,105400567,105404163,105412337,105420937,105422831,105438890,105452852,105456014,105459337,105464760,105470305,105495239,105572257,105586253,105587579,    105378073,105389196,105397415,105400454,105400662,105404310,105412432,105421303,105423017,105439094,105452984,105456102,105459475,105464882,105470462,105495386,105572508,105586435,105587887,    0    CBLB    cmpl    cmpl    1,1,1,1,2,2,0,0,0,0,0,2,2,0,2,2,0,0,-1,
219    NM_001110792.1    chrX    -    153295685    153363188    153295817    153363122    3    153295685,153297657,153363060,    153296901,153298008,153363188,    0    MECP2    cmpl    cmpl    2,2,0,

desired output

173    NM_170662.3    chr3    -    105377108    105587887     105377813    105586421    19     105377108,105389076,105397274,105400322,105400567,105404163,105412337,105420937,105422831,105438890,105452852,105456014,105459337,105464760,105470305,105495239,105572257,105586253,105587579,      105378073,105389196,105397415,105400454,105400662,105404310,105412432,105421303,105423017,105439094,105452984,105456102,105459475,105464882,105470462,105495386,105572508,105586435,105587887,     0    CBLB    cmpl    cmpl    1,1,1,1,2,2,0,0,0,0,0,2,2,0,2,2,0,0,-1,
219   NM_004992.3     chrX    -    153295685    153363188    153295817     153363122    3    153295685,153297657,153363060,     153296901,153298008,153363188,    0    MECP2    cmpl    cmpl    2,2,0,

awk

awk 'FNR==NR { a[$1]=$13; next } { if(a[$2]){$2=a[$2] }; print }' OFS="\t" file1 file2 > out

Let's remove a few things out and see if you see what's going on.

awk 'FNR==NR { a[$1]=$13; next }' file1

file1

How many fields does file1 has?: 2
What do you expect that it will be stored in a[$1] if you are asking for the 13th field in file1?

1 Like

I see that there is no $13 in file1 .

So the below in bold would store MECP2 in a[$1] . The the portion in italics would update $2 in file2 only if $13=a[$1] ?

I had file1 and file2 confused in the awk , but am not quite sure about the second portion. Thank you :).

awk 'FNR==NR { a[$1]=$1; next } { if(a[$2]){$13=a[$1] }; print }' OFS="\t" file1 file2 > out

Actually, field $2 is the value you need to store from file1
Give it a try to the following:

awk 'FNR==NR {a[$1]=$2; next} a[$13]{$2=a[$13]}1' OFS='\t' file1 file2
1 Like

Thank you very much for your help :slight_smile: