Using = with sed to increase sequence count

I have a fasta file like this one:

>ID1
AAAAAA
>ID2
TTTTTT

And I am using this sed script to increase the count sequence

sed '/^>/s/.*//;/^$/=;/^$/d' text.txt | sed 's/[1-9].*/echo ">seq" \$(( ( & + 1 )\/2 ))/e'

I get the desired output:

>seq 1
AAAAAA
>seq 2
TTTTTT

However, this doesn't work and I do not understand why:

sed '/^>/s/.*//;/^$/=;/^$/d;s/[1-9].*/echo ">seq" \$(( ( & + 1 )\/2 ))/e' text.txt

I was hoping someone here would help me understand the issue. Moreover, I was hoping I could get a better, more elegant sed solution. While perl or awk might be more appropriate, I am actually looking for 100% sed approach.
Thanks in advance

sed is a great tool; but, since you can't perform arithmetic calculations in sed , a 100% sed solution is not possible.

An awk solution for this is simple:

awk '/^>/{$0 = ">seq " ++seq}1' file

Your (certainly simplified and thus non-representative) sample leans itself towards

sed 's/>ID/>seq /' file
>seq 1
AAAAAA
>seq 2
TTTTTT

EDIT: And here it is - the non-efficient but 100% sed solution (tadaaa!):

sed 'N; s/\n/#/' file | sed '=' | sed -r 'N; s/\n//; s/^([0-9]*)>ID[0-9]*#/>seq \1\n/'
>seq 1
AAAAAA
>seq 2
TTTTTT

or, even simpler,

sed 'N; s/^.*\n//' file | sed '=' | sed '/^[0-9]\+/ s/^/>seq /'

Don't change your input file structure and then complain it would not work ...

2 Likes

Hi, for fun, only one command sed:

sed -e '1{
 x
 s/.*/0/
 x
}
/^>/{
 x
 :d
 s/9\(_*\)$/_\1/
 td
 s/^\(_*\)$/0\1/
 s/8\(_*\)$/9\1/
 s/7\(_*\)$/8\1/
 s/6\(_*\)$/7\1/
 s/5\(_*\)$/6\1/
 s/4\(_*\)$/5\1/
 s/3\(_*\)$/4\1/
 s/2\(_*\)$/3\1/
 s/1\(_*\)$/2\1/
 s/0\(_*\)$/1\1/
 s/_/0/g
 x
 G
 s/.*\n/>seq /
}'  file

Increment code take in gnu sed documentation ( info sed )

Regards.

[not-quite-serious-mode]

Ha! This is perhaps the first time i find something to nit-pick in anything the infallible Don has pontificated. Actually it is posssible to do arithmetic in sed . Here, for example, is addition/subtraction (from stackoverflow):

s/[0-9]/<&/g
s/0//g
s/1/|/g
s/2/||/g
s/3/|||/g
s/4/||||/g
s/5/|||||/g
s/6/||||||/g
s/7/|||||||/g
s/8/||||||||/g
s/9/|||||||||/g
: tens
s/|</<||||||||||/g
t tens
s/<//g
s/+//g
: minus
s/|-|/-/g
t minus
s/-$//
: back
s/||||||||||/</g
s/<\([0-9]*\)$/<0\1/
s/|||||||||/9/
s/||||||||/8/
s/|||||||/7/
s/||||||/6/
s/|||||/5/
s/||||/4/
s/|||/3/
s/||/2/
s/|/1/
s/</|/g
t back

In fact, sed is a (Turing-) complete programming language. This can be shown by either writing a Turing-machine in sed (shown here) or by writing an interpreter for another Turing-complete language. With much fanfare, here is a Brainfuck-interpreter written in sed .

[/not-quite-serious-mode]

I hope this helps (well, actually i doubt it, but this is a holiday where i am, so it is a day off and it is fun).

bakunin

PS: Input to the sed-script above would be "100+15" or "250-173"

2 Likes

@bakunin: now, please, show us how to multiply floats with sed . I'd like to see you juggle 1E18 lucifer matches ("Streichholz" in German) ...

Sigh, and that on my day off. Fortunately there is aunt Google, which is always there when i need her. From math - Addition with 'sed' - Unix & Linux Stack Exchange:

sed 's/[0-9]/<&/g
s/0//g; s/1/|/g; s/2/||/g; s/3/|||/g; s/4/||||/g; s/5/|||||/g; s/6/||||||/g
s/7/|||||||/g; s/8/||||||||/g; s/9/|||||||||/g
: tens
s/|</<||||||||||/g
t tens
s/<//g
s/.*\*$/0/
s/^\*.*/0/
s/*|/*/
: mult
s/\(|*\)\*|/\1<\1*/ 
t mult
s/*//g
s/<//g
: back
s/||||||||||/</g
s/<\([0-9]*\)$/<0\1/
s/|||||||||/9/; s/||||||||/8/; s/|||||||/7/; s/||||||/6/; s/|||||/5/; s/||||/4/
s/|||/3/; s/||/2/; s/|/1/
s/</|/g
t back'

and @Wisecracker: the implementation of an FFT is sed is left to the interested reader. ;-))

bakunin

PS: my favourite math quote: "Base 8 is actually like base 10 - if you are missing two fingers." (Tom Lehrer, "New Math")

No complains here! Works like a charm!

For my own education, why this wouldnt work?

sed 'N; s/^.*\n//;=;/^[0-9]\+/ s/^/>seq /' file

Or this:

sed '/^>/s/.*//;/^$/=;/^$/d;s/[1-9].*/echo ">seq" \$(( ( & + 1 )\/2 ))/e' text.txt

Thanks for the help
PS:
disedorgue, thanks for you reply
bakunin, Rudy and Don, I really do appreciate it! I always learn something new from you guys.


  1. 0-9 ↩ī¸Ž

The = prints immediatly to stdout and can't be changed within the same sed invocation. And, it increments by two.

Same here - you need to quit sed and run a new process on its output.

Rudy
Thanks!