Delete some words

hi, i have a fasta file like this:

>contig00003  length=363  numreads=45  gene=isogroup00001  status=it_thresh
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010  length=760  numreads=49  gene=isogroup00001  status=it_thresh
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W length=257
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO length=105
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

.....

How can i remove the additional information for each sequence and get a file like this:

>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

.....

Thanks

awk '/^>/ { NF=1 } 1' inputfile > outputfile
1 Like

Thanks, that works perfectly.

1 Like

Hello the_simpsons,

The following may also help.

awk '/^>/ {$0=$1} 1'  filename

Output will be as follows.

>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

EDIT: Adding one more solution for same.

[singh@localhost awk_programming]$ awk '/^>/ {print $1} !/^>/ {print $0}' filename
>contig00003
GATTTTTTACCCTGGGAGTGAGGAGGACGAGGTTGAGGATGAAGAAAAGAGAAAGATGAAGAGGTTGAGGATGTT
GTAGTCGGCGGTGGAATTAGGGGGAGCCGGCGAGCCCAAGTATTTTGCAGAGGTGTCTTCATCATCCAAACAACA
CGAGAGGGTGCAATTTGGTCTCTGCGTTGTTATAGATCCAAAGTTTTTGGACCCTGTTTGGCATCGTGTATCAAGTA
TTGGTTACACAGTCTATATTTTCAAGAACGAGACTGTGAAAGCTGTAAGCAACTTTTTATTtATCTATTTATTTTTATG
CTATAGCTTAtattaaactta
>contig00010
TCAAAGTTTTAGGTTCCAATTTGTATGGCTCAACTTAAGAAGTTTGTTGTAAAAAaGGAAATTCTTTCTGATCTATTA
GGGGCAGAAGTGCCACAATATATGAAGTTGAGAAATTAAaTAAAGTAATCATAGTACATTGTCTCGTTTGGATAGAC
GTAGGCTCTCAaGAAAAAAaGTTCTCATAGTTCTTGATGATGTGGATGATTTAGTGCGGCAAGTAGAACCTGGTCAA
GGGAGTAGAATAATTATGACAAGCAGAGATAGACAATTGAGTGAAGCTCTCTGCCTGTTTTGCAAGCATGCCTTCAA
GCGACAATTTCTAAGAACAGGATATTTAATAAGGCAATTGATCATGCTCAGGG
>G383C4U02H6B5W
CCGGGCTCCCCATCTTCTCTATCTCTTGTGTGATTGTTGCAGAATACATCAAAGACTTGGGGTTGAGAGAGACAGCA
TCATAAACCTGATCACGGAAGCCCCTTTGAAGCATGCAGTCCACCTCATCTAGCCTTGTTGTCGTTGAAATAGTCCAT
CTGCCATCTTTAAATACATGCGCAACATAATGCCCGCATTGCGTATCTAGTCCAATATCATGCTTTATTAAAAAGATCA
ATAAGCCTTCCTGGAGTCCCCACAATCAAGTTCCAACTCCTTGCTGAAATGCGGTAGAGTTGTCCAGCCATCA
>G383C4U02IH1AO
TTCAAGGAACTTTCATCCATCCAATGATCTAACCAATTTGAACCTAGTTTTGATTCATCTCTGAAGTTCGAATTTGAAC
CACATTCTTAAGAATTGAGGGCCCATCAAATTTAGTACTATAATCATGAAGTAGGTGATCCTCTCTTGTCACTCTTTTC
ATCATCAGCAAGATGACTTCTCATTGGAATGCTACCATGCTTGTTCCAAAA

Thanks,
R. Singh