Concatenate lines between lines starting with a specific pattern

Hi,

I have a file such as:
---
>contig00001 length=35524 numreads=2944
gACGCCGCGCGCCGCGGCCAGGGCTGGCCCA
CAGGCCGCGCGGCGTCGGCTGGCTGAG
>contig00002 length=4242 numreads=43423
ATGCCGAAGGTCCGCCTGGGGCTGG
CGCCGGGAGCATGTAGCG
---
I would like to concatenate the lines not starting with ">" (concatenate any lines between lines starting with ">"). My wanted output is:
---
>contig00001 length=35524 numreads=2944
gACGCCGCGCGCCGCGGCCAGGGCTGGCCCACAGGCCGCGCGGCGTCGGCTGGCTGAG
>contig00002 length=4242 numreads=43423
ATGCCGAAGGTCCGCCTGGGGCTGGCGCCGGGAGCATGTAGCG
---

Thanks

---------- Post updated at 01:54 PM ---------- Previous update was at 01:48 PM ----------

I have tried like this:
% awk '{if(substr($0,1)==">") print $0"\n";else printf("%s",$0);}' test2.fna | fold -w60
But my output looks like:

>contig00001 length=35524   numreads=2944gACGCCGCGCGCCGCGGCC
AGGGCTGGCCCACGGCCcTCTTCCGGCGCGCTGCGCAGGCGTTCGGCCAGGCCGCGCGGC
GTCGGCTGGCTGAGCGCCCAGCGTAGCAGGCGATCGAACGGATGCCGACGGGCGCTTTCC
AGTCGTTCGCGCAAACGGGCGATCAACTGGGCGATCAACAGCGAGTCGCCGCCAGCCCCG
AAGAAGTCTTGCTCGACGCCCAGCGACGGGTTGTCCAGCACCTCCCGCCAGAGTGCCAGC

Instead of what I want which is like this:

>contig00001 length=35524   numreads=2944
gACGCCGCGCGCCGCGGCCAGGGCTGGCCCACGGCCcTCTTCCGGCGCGCTGCGCAGGCG
TTCGGCCAGGCCGCGCGGCGTCGGCTGGCTGAGCGCCCAGCGTAGCAGGCGATCGAACGG
ATGCCGACGGGCGCTTTCCAGTCGTTCGCGCAAACGGGCGATCAACTGGGCGATCAACAG
CGAGTCGCCGCCAGCCCCGAAGAAGTCTTGCTCGACGCCCAGCGACGGGTTGTCCAGCAC
CTCCCGCCAGAGTGCCAGC

Try this:

awk '{printf (/>/)?RS"%s"RS:"%s",$0}END{print x}' infile

your code does not work because of this:

substr($0,1,1)
1 Like

Thanks. It worked. Yeah.
:slight_smile:

awk '{printf /^>/?RS $0 RS:$0}' infile

Hi Scruti ... ready for a nitpicking ? :wink:

Your code add an unexpected empty line if the input file start with a ">" line :smiley:

I know, but I figured it would complicate the code and it would not really matter. I did add the linefeed at the end, otherwise if the output gets written to a file, that last line becomes invalid, since the last line is not terminated with a linefeed..

awk '{printf />/?(NR>1?RS:x)"%s"RS:"%s",$0}END{print x}' infile

Dude, you're true, but i like to see your skill in action, that's why i challenged you :wink:

hii

can u please explain this code

awk '{printf /^>/?RS $0 RS:$0}' infile

i did not understand one thing in this above code

RS $0 RS 

why this

please explain as i m confused on this...:slight_smile:

RS equals "\n" so it means newline followed by the record (line) and then a newline

1 Like

hiiiiiiiii

thanks for ur reply