Replace text in column1 of a file matching columns of another file

Hi all,

I have 2 files:

species-names.txt

Abaca-bunchy-top-virus	((((Abaca-bunchy-top-virus((Babuvirus((Unassigned((Nanoviridae((Unassigned))))
Abutilon-mosaic-virus	((((Abutilon-mosaic-virus((Begomovirus((Unassigned((Geminiviridae((Unassigned))))
Abutilon-yellows-virus	((((Abutilon-yellows-virus((Crinivirus((Unassigned((Closteroviridae((Unassigned))))

sequence-names.txt

gi|145845934|gb|EF546810.1|-Abaca-bunchy-top-virus-isolate-Q767-segment-DNA-S,-complete-sequence	GGCAGGGGGGCTTATTATTACCCCCCCTGCC
gi|145845936|gb|EF546811.1|-Abutilon-mosaic-virus-isolate-Q767-segment-DNA-M,-complete-sequence	GGGGCTGGGGCTTATTATTACCCCCAGCCCCGGAACGGGACATCAC
gi|145845938|gb|EF546812.1|-Abutilon-yellows-virus-isolate-Q767-segment-DNA-C,-complete-sequence	GGCAGGGGGGCTTATTATTACCCCCCCTGCCCGGG

I need to replace text in 1st column of file sequence-names.txt which matches 1st column of file species-names.txt to text of 2nd column of species-names.txt. Output will be:

gi|145845934|gb|EF546810.1|-((((Abaca-bunchy-top-virus((Babuvirus((Unassigned((Nanoviridae((Unassigned))))-isolate-Q767-segment-DNA-S,-complete-sequence	GGCAGGGGGGCTTATTATTACCCCCCCTGCC
gi|145845936|gb|EF546811.1|-((((Abutilon-mosaic-virus((Begomovirus((Unassigned((Geminiviridae((Unassigned))))-isolate-Q767-segment-DNA-M,-complete-sequence	GGGGCTGGGGCTTATTATTACCCCCAGCCCCGGAACGGGACATCAC
gi|145845938|gb|EF546812.1|-((((Abutilon-yellows-virus((Crinivirus((Unassigned((Closteroviridae((Unassigned))))-isolate-Q767-segment-DNA-C,-complete-sequence	GGCAGGGGGGCTTATTATTACCCCCCCTGCCCGGG

Thanks a lot!

try:

awk '
NR==FNR {a[$1]=$2; next;}
{c=0; for (i=1; i<=NF; i++) {for (s in a) if ($0 ~ s && c==0) {sub(s, a); c=1}}}
1
' species-names.txt sequence-names.txt
1 Like

Thanks a lot rdrtx1,

You script worked nicely.

Because column 2 of file sequence-names.txt was very large, I tried to awk print column 1 only, and run your script with it, then printed the output as column 1 and then column 2 of the original sequence-names.txt. Same result and quicker.

Really appreciate again!