Is it possible to rename fasta headers based on its position specified in another file?

dineshkumarsrk · November 13, 2019, 2:01am

I have 5 sequences in a fasta file namely gene1.fasta as follows,

gene1.fasta
>1256
ATGTAGC
>GEP
TAGAG
>GTY578
ATGCATA
>67_iga
ATGCTGA
>90_ld
ATGCTG

I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows,

list.txt
position1=org5
position2=amylase
position3=org8
position4=lipase
position5=org_1

The expected outcome should be like this,

>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG

Thanks in advance.

RudiC · November 13, 2019, 2:37am

Any attempts / ideas / thoughts from your side? Applying what you learned in here?

dineshkumarsrk · November 13, 2019, 3:06am

Dear Rudic,
Below script can rename the characters in gene1.fasta specified in list.txt.

awk 'FNR==NR{REP[$1]=$2; next} {for (r in REP) gsub(r, REP[r])}1' FS="=" list.txt gene1.fasta

However, it is not based on the position. Its purely based on the matching strings between the two files. But, here my problem is different, I tried workout like this

++i

, but my list.txt is not having common strings, so I can not rename sequentially. That is why I seek your help.

RudiC · November 13, 2019, 4:02am

For the easy case that your replacements are in lines in increasing order, try

awk 'FNR==NR {REP[NR] = $2; next} /^>/ {$0 = ">" REP[++CNT]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
 ATGCTG

EDIT: in case it's not (here: position3 doesn't exist), try

awk 'FNR==NR {REP[$1] = $2; next} /^>/ && (TMP = "position" ++CNT) in REP {$0 = ">" REP[TMP]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>GTY578
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG

dineshkumarsrk · November 13, 2019, 4:41am

@RudiC,
Both serve my purpose perfectly.

Scrutinizer · November 13, 2019, 4:38pm

Also try:

awk -F= '/^>/{if(getline<f>0) $0=">" $2}1' f=list.txt gene1.fasta

or without the check:

awk -F= '/^>/{getline<f; $0=">" $2}1' f=list.txt gene1.fasta