I only want to extract the header of sequence_2 and its content.
Do anybody got idea how to do it?
Will awk response faster if got a long list of contents?
Thanks for all of your suggestion
Hi thegeek,
Thanks for your suggestion. It is worked nice.
Can you roughly explain about the reason that you write the code?!
sed -n -e '/>sequence_3/q' -e '/>sequence_2/,/>sequence_3/p' t1
For example, if I got long list of contents and I only want to extract specific contents based on the interested header, can I use the sed code that you recommend as well?
The following is more generic and would also work in case the actual label is not "sequence_2" but the OP means the second record and the ">" at the beginning of a line marks the start of a label of a new record:
mawk 'BEGIN {RS="\n>"; printf">"} NR==2' infile
or gawk. As danmero pointed out, this code does not work with standard awk nor nawk or posix awk. Those versions only accept a single character for RS.
from the record that matches the _2$ pattern to the end of the input (0 -> false -> never -> eof).
And of course, we exit prematurely because of the previous action.
Just a few words about the beauty of the programming code ...
We often try to play golf[1] here and we're doing it for fun.
In my opinion, a piece of code or a program is beautiful when:
it's self documenting (!)
concise and simple (simple as possible)
it takes advantage of the full functionality/potential of the given programming language
That said, at least as far as my posts are concerned, you should take those obfuscated and golfed samples for what they are.
Try to understand them, use them on the command line, but don't use them in scripts and/or production code.
Think about the next maintainer of that code.
Thanks thats good advice about the readability.What prompted me to join this forum is the need to learn to write code that runs as quickly as possible. At work I am now writing tools that run against gigabytes of data written in ksh and nawk. I have been learning to optimise code recently and am astonished by the improvement in speed that can be achieved, particularly when creating extra processes in a loop.One script I optimised recently went from 5+hrs to 20 mins run time simply by minimising the processes being kicked off in two loops!Hence my interest in writing "lean" code....Cheers
If I got a long list of file, how I can use your script or program to extract only the contents of sequence_2,ABC_6,SDF_7?
Do you have any idea how I can extract specific content only from a long list of file?
As I try, the awk script that you suggested only can extract sequence_2 from a long list of file.
Thanks again:)
---------- Post updated at 10:27 AM ---------- Previous update was at 09:24 AM ----------
To keep the forums high quality for all users, please take the time to format your posts correctly.
Use Code Tags when you post any code or data samples so others can easily read your code.
You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags and by hand.)
Avoid adding color or different fonts and font size to your posts.
Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.
Be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.
If my file also got the content header like ABC_61,ABC_605,SDF_750.
All of them, the code that you suggested also will extract.
Do you have any better idea just specific and extract only sequence_2,ABC_6 and SDF_7. Really thanks for your suggestion ^^