How to print only lines in between patterns?

jamo · April 14, 2013, 2:12pm

Hi,

I want to print only lines (green-italic lines) in between first and last strings in column 9.
there are different number of lines between each strings.

10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS 10 AUGUSTUS exon 4558 4669 . - . 10.g1
exon 8771 8889 . - . 10.g1
exon 16216 16284 . - . 10.g1
exon 17048 17135 . - . 10.g1
exon 17366 17525 . + . 10.g2
exon 19544 19603 . + . 10.g2
exon 20007 20109 . + . 10.g2
exon 23737 23937 . + . 10.g2
exon 25123 25203 . + . 10.g2
exon 27110 27939 . + . 10.g2
exon 50636 50833 . + . 10.g3
exon 59097 59219 . + . 10.g3
exon 60051 60138 . + . 10.g3
exon 61590 61689 . + . 10.g3
exon 62437 62607 . + . 10.g3
exon 74427 74832 . - . 10.g4
exon 77230 77312 . - . 10.g4
exon 80858 80963 . - . 10.g4
exon 81384 81449 . - . 10.g4
exon 84076 84284 . - . 10.g4
exon 86396 86603 . + . 10.g5
exon 92171 92326 . + . 10.g5
exon 97612 97801 . + . 10.g5
exon 102323 102795 . + . 10.g5
exon 104180 104288 . + . 10.g5
exon 107156 107309 . + . 10.g5
exon 107417 107547 . + . 10.g5
exon 112961 113096 . + . 10.g5
exon 113512 113866 . + . 10.g5
exon 115101 115548 . + . 10.g5

how can I do that?
Thanks in advance.
jamo

Don_Cragun · April 14, 2013, 2:32pm

You could try:

awk '
$NF != first {
        if(last) print last
        print 
        first = $NF
        last = "" 
        next
}       
{       last = $0}
END {   if(last) print last
}' input

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg/bin/awk , or nawk instead of awk .

Oops! I misread the request, the above script prints the 1st and last lines of each set; not the lines between the 1st and last lines.

Scrutinizer · April 14, 2013, 4:04pm

Another one:

awk -F. 'p!=$NF{p=$NF; s=x; next} s x{print s} {s=$0}' file

Don_Cragun · April 14, 2013, 4:47pm

Hi Scrutinizer,
I usually like your code (although I sometimes find it a little bit terse) and don't see anything unneeded. But, I don't understand why you have:

s x{print s}

rather than

s{print s}

Could you explain what why you need to concatenate an empty string to s for this test?

Scrutinizer · April 14, 2013, 5:29pm

Hi Don, this is to force s into a string context. I have found this to work reliably across awks.

Don_Cragun · April 14, 2013, 5:40pm

OK. Given that the last field contains an alphabetic character, I didn't see the need for forcing it to be treated as a string in this case, but it is safer if other data doesn't match what was shown in the example. (The only time this would matter is when the final field has the numeric value "0".)

I also noted that you treated "." as the field separator while I assumed the default field separator (spaces and tabs). The original specification isn't at all clear on this point. If it matters, jamo will have to clarify what is wanted.

Scrutinizer · April 14, 2013, 5:55pm

Yes in this case it probably will not matter, but it is safer, and another potentially problematic situation ( probably not in this case ) would be a line that consists entirely of spacing (which would then not get printed)..
You are right about the dot FS. I assumed the OP meant the part after the last dot, but on rereading, it appears moer likely he meant the "10.g.." in which case the field seperator would need to be the default, and it would become:

awk 'p!=$NF{p=$NF; s=x; next} s x{print s} {s=$0}' file