Create specific output

I have a file that looks like the below and only need certain lines, but am not sure how to do that. It is basically everything from the @HD up to the @RG. Thank you :).

file.txt - input

@HD    VN:1.4    GO:none    SO:coordinate @SQ    SN:chr1    LN:249250621 @SQ    SN:chr2    LN:243199373 @SQ    SN:chr3    LN:198022430 @SQ    SN:chr4    LN:191154276 @SQ    SN:chr5    LN:180915260 @SQ    SN:chr6    LN:171115067 @SQ    SN:chr7    LN:159138663 @SQ    SN:chr8    LN:146364022 @SQ    SN:chr9    LN:141213431 @SQ    SN:chr10    LN:135534747 @SQ    SN:chr11    LN:135006516 @SQ    SN:chr12    LN:133851895 @SQ    SN:chr13    LN:115169878 @SQ    SN:chr14    LN:107349540 @SQ    SN:chr15    LN:102531392 @SQ    SN:chr16    LN:90354753 @SQ    SN:chr17    LN:81195210 @SQ    SN:chr18    LN:78077248 @SQ    SN:chr19    LN:59128983 @SQ    SN:chr20    LN:63025520 @SQ    SN:chr21    LN:48129895 @SQ    SN:chr22    LN:51304566 @SQ    SN:chrX    LN:155270560 @SQ    SN:chrY    LN:59373566 @SQ    SN:chrM    LN:16569 @RG    ID:8AH6U.IonXpress_009    PL:IONTORRENT    PU:Unspecified/P1.1.17/IonXpress_009    FO:TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACG    DT:2015-06-03T14:03:02-0700    SM:E1    PG:tmap    KS:TCAGTGAGCGGAACGAT    CN:TorrentServer/Proton 

Desired output

@HD    VN:1.4    GO:none    SO:coordinate @SQ    SN:chr1    LN:249250621 @SQ    SN:chr2    LN:243199373 @SQ    SN:chr3    LN:198022430 @SQ    SN:chr4    LN:191154276 @SQ    SN:chr5    LN:180915260 @SQ    SN:chr6    LN:171115067 @SQ    SN:chr7    LN:159138663 @SQ    SN:chr8    LN:146364022 @SQ    SN:chr9    LN:141213431 @SQ    SN:chr10    LN:135534747 @SQ    SN:chr11    LN:135006516 @SQ    SN:chr12    LN:133851895 @SQ    SN:chr13    LN:115169878 @SQ    SN:chr14    LN:107349540 @SQ    SN:chr15    LN:102531392 @SQ    SN:chr16    LN:90354753 @SQ    SN:chr17    LN:81195210 @SQ    SN:chr18    LN:78077248 @SQ    SN:chr19    LN:59128983 @SQ    SN:chr20    LN:63025520 @SQ    SN:chr21    LN:48129895 @SQ    SN:chr22    LN:51304566 @SQ    SN:chrX    LN:155270560 @SQ    SN:chrY    LN:59373566 @SQ    SN:chrM    LN:16569 

What do you mean "I have a file that looks like the below and only need certain lines"? You have shown us same input that is a single line. And the output you want is also a single (although shorter) line.

Are you saying that some lines don't contain one or both of @HD and @RG and that you want to eliminate those lines?

You said: "It is basically everything from the @HD up to the @RG.", but you didn't keep the space(s) before the %RG that were present in your input.

Please given us a clearer description of what you are trying to do and show us what you have tried.

What operating system and shell are you using?

Hi cmccabe,

Don is absolutely right.
However, you can try this if it helps

 sed -n 's/^\(@HD.*\)@RG.*/\1/p' file.txt

It depends on the split point. What is the rule you want to use?

  • Line length (truncate)
  • Number of fields
  • Specific delimiter - single character or string
  • Multiple delimiters e.g. after the 10th @
  • Something else :confused:

If you can explain how you want to work with the data, then we can work something out, but a single line input doesn't give us the rule clearly.

Thanks, in advance,
Robin