How to print only lines in between patterns?

Hi,

I want to print only lines (green-italic lines) in between first and last strings in column 9.
there are different number of lines between each strings.

10 AUGUSTUS exon 4558 4669 . - . 10.g1
10 AUGUSTUS exon 8771 8889 . - . 10.g1
10 AUGUSTUS exon 16216 16284 . - . 10.g1
10 AUGUSTUS exon 17048 17135 . - . 10.g1
10 AUGUSTUS exon 17366 17525 . + . 10.g2
10 AUGUSTUS exon 19544 19603 . + . 10.g2
10 AUGUSTUS exon 20007 20109 . + . 10.g2
10 AUGUSTUS exon 23737 23937 . + . 10.g2
10 AUGUSTUS exon 25123 25203 . + . 10.g2
10 AUGUSTUS exon 27110 27939 . + . 10.g2
10 AUGUSTUS exon 50636 50833 . + . 10.g3
10 AUGUSTUS exon 59097 59219 . + . 10.g3
10 AUGUSTUS exon 60051 60138 . + . 10.g3
10 AUGUSTUS exon 61590 61689 . + . 10.g3
10 AUGUSTUS exon 62437 62607 . + . 10.g3
10 AUGUSTUS exon 74427 74832 . - . 10.g4
10 AUGUSTUS exon 77230 77312 . - . 10.g4
10 AUGUSTUS exon 80858 80963 . - . 10.g4
10 AUGUSTUS exon 81384 81449 . - . 10.g4
10 AUGUSTUS exon 84076 84284 . - . 10.g4
10 AUGUSTUS exon 86396 86603 . + . 10.g5
10 AUGUSTUS exon 92171 92326 . + . 10.g5
10 AUGUSTUS exon 97612 97801 . + . 10.g5
10 AUGUSTUS exon 102323 102795 . + . 10.g5
10 AUGUSTUS exon 104180 104288 . + . 10.g5
10 AUGUSTUS exon 107156 107309 . + . 10.g5
10 AUGUSTUS exon 107417 107547 . + . 10.g5
10 AUGUSTUS exon 112961 113096 . + . 10.g5
10 AUGUSTUS exon 113512 113866 . + . 10.g5
10 AUGUSTUS exon 115101 115548 . + . 10.g5

how can I do that?
Thanks in advance.
jamo

You could try:

awk '
$NF != first {
        if(last) print last
        print 
        first = $NF
        last = "" 
        next
}       
{       last = $0}
END {   if(last) print last
}' input

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg/bin/awk , or nawk instead of awk .

Oops! I misread the request, the above script prints the 1st and last lines of each set; not the lines between the 1st and last lines.

1 Like

Another one:

awk -F. 'p!=$NF{p=$NF; s=x; next} s x{print s} {s=$0}' file
1 Like

Hi Scrutinizer,
I usually like your code (although I sometimes find it a little bit terse) and don't see anything unneeded. But, I don't understand why you have:

s x{print s}

rather than

s{print s}

Could you explain what why you need to concatenate an empty string to s for this test?

1 Like

Hi Don, this is to force s into a string context. I have found this to work reliably across awks.

1 Like

OK. Given that the last field contains an alphabetic character, I didn't see the need for forcing it to be treated as a string in this case, but it is safer if other data doesn't match what was shown in the example. (The only time this would matter is when the final field has the numeric value "0".)

I also noted that you treated "." as the field separator while I assumed the default field separator (spaces and tabs). The original specification isn't at all clear on this point. If it matters, jamo will have to clarify what is wanted.

1 Like

Yes in this case it probably will not matter, but it is safer, and another potentially problematic situation ( probably not in this case ) would be a line that consists entirely of spacing (which would then not get printed)..
You are right about the dot FS. I assumed the OP meant the part after the last dot, but on rereading, it appears moer likely he meant the "10.g.." in which case the field seperator would need to be the default, and it would become:

awk 'p!=$NF{p=$NF; s=x; next} s x{print s} {s=$0}' file
1 Like