I have 5 sequences in a fasta file namely gene1.fasta as follows,
gene1.fasta
>1256
ATGTAGC
>GEP
TAGAG
>GTY578
ATGCATA
>67_iga
ATGCTGA
>90_ld
ATGCTG
I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows,
list.txt
position1=org5
position2=amylase
position3=org8
position4=lipase
position5=org_1
The expected outcome should be like this,
>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG
Thanks in advance.
RudiC
November 13, 2019, 2:37am
2
Any attempts / ideas / thoughts from your side? Applying what you learned in here?
Dear Rudic,
Below script can rename the characters in gene1.fasta specified in list.txt.
awk 'FNR==NR{REP[$1]=$2; next} {for (r in REP) gsub(r, REP[r])}1' FS="=" list.txt gene1.fasta
However, it is not based on the position. Its purely based on the matching strings between the two files. But, here my problem is different, I tried workout like this
++i
, but my list.txt is not having common strings, so I can not rename sequentially. That is why I seek your help.
RudiC
November 13, 2019, 4:02am
4
For the easy case that your replacements are in lines in increasing order, try
awk 'FNR==NR {REP[NR] = $2; next} /^>/ {$0 = ">" REP[++CNT]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>org8
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG
EDIT: in case it's not (here: position3 doesn't exist), try
awk 'FNR==NR {REP[$1] = $2; next} /^>/ && (TMP = "position" ++CNT) in REP {$0 = ">" REP[TMP]}1' FS="=" list.txt gene1.fasta
>org5
ATGTAGC
>amylase
TAGAG
>GTY578
ATGCATA
>lipase
ATGCTGA
>org_1
ATGCTG
1 Like
@RudiC ,
Both serve my purpose perfectly.
Also try:
awk -F= '/^>/{if(getline<f>0) $0=">" $2}1' f=list.txt gene1.fasta
or without the check:
awk -F= '/^>/{getline<f; $0=">" $2}1' f=list.txt gene1.fasta
1 Like