Match pattern only between certain lines in entire file

jvoot · June 19, 2018, 11:09pm

Hello, I have input that looks like this:

          * 0 -1 103 0 0 m. 7 LineNr 23 ClauseNr 1: 1: 1: 304: 0 0 SentenceNr 13 TxtType: Q Pargr: 2.1 ClType:MSyn
 PS004,006 ZBX=                0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 ZBX                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1       2      -1      -1   -1   -1    -1
 PS004,006 YDQ                 0   2 -1 -1 -1  1 -1   -1 -1  1  2     2   2   2   1  -10002      -1      -1    0  503     0
           * 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
 PS004,006 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
 PS004,006 BVX                 0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 >L                  0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
 PS004,006 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   5   2      -1      -1      -1    0  504     0
           * 0 -1 201 0 0 .. 6 LineNr 25 ClauseNr 1: 1: 3: 153: 0 0 SentenceNr 15 TxtType: Q Pargr: 2.1 ClType:WIm0
 PS004,007 RB                  0  13 -1 -1 -1  4 -1   -1 -1  3  2     2   2   2   1      -1      -1      -1    0  502     0
 PS004,007 >MR                -1   1  0  0  1  4 -1    6  0  3  2     2   1   1  -1      -1      -1      -1    0  521     0
           * 0 -18 163 1 999 2 136 0 0 .# 2 LineNr 26 ClauseNr 1: 1: 2: 106: 0 0 SentenceNr 16 TxtType: Q Pargr: 2.2 ClType:Ptcp
 PS004,007 MJ                  0   9 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   9   9   1      -1      -1      -1    0  502     0
 PS004,007 R>H                 0   1  2  2  1 -1 -1    1  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,007 NW                 -1   7 -1 -1 -1 -1 -1   -1  1  3 -1    -1   7   7   2      -1      -1      -1    0  503     0
 PS004,007 VWB                 0  13 -1 -1 -1  1 -1   -1 -1  1  0     2   2   2   1      -1      -1      -1    0  503     0
           * 0 -1 999 0 0 .q 4 LineNr 27 ClauseNr 1: 1: 4: 121: 0 0 SentenceNr 17 TxtType: QQ Pargr: 2.2.1 ClType:XYqt

I would like to use either awk, sed, or grep to match a regex, but print not only line that contains the match, but also those lines before and after that match until a line that begins with a certain character.

So, for example, in the input above, if I would like to match the pattern "BVX" in field 2, I would desire the output to include not only that line, but also those between the nearest two lines before and after beginning with "*".

Thus the desired output would be:

           * 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
 PS004,006 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
 PS004,006 BVX                 0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 >L                  0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
 PS004,006 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   5   2      -1      -1      -1    0  504     0
           * 0 -1 201 0 0 .. 6 LineNr 25 ClauseNr 1: 1: 3: 153: 0 0 SentenceNr 15 TxtType: Q Pargr: 2.1 ClType:WIm0

This is a very long file where a given pattern (such as "BVX" in the example) can occur multiple times. I would like to print each match of "BVX" and the lines before it stopping at /^\/ and after the match stopping at /^\/.

I have attempted combinations of grep and sed, but to no avail, e.g.

grep -C5 "BVX" input | sed -n '/\*/,/\*/p'

Thank you so much in advance.

RudiC · June 20, 2018, 3:20am

How about

awk '
                {BUF = BUF ORS $0
                }
$2 == "BVX"     {PRT = 1
                }
/^ *\*/         {if (PRT) print BUF
                 BUF = $0 
                 PRT = ""
                }
' file
           * 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
 PS004,006 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
 PS004,006 BVX                 0   1  1  0  7 -1 -1    3  2  3  2    -1   1   1  -1      -1      -1      -1    0  501     0
 PS004,006 >L                  0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
 PS004,006 JHWH                0   3 -1 -1 -1  1 -1   -1 -1  1  2     2   3   5   2      -1      -1      -1    0  504     0
           * 0 -1 201 0 0 .. 6 LineNr 25 ClauseNr 1: 1: 3: 153: 0 0 SentenceNr 15 TxtType: Q Pargr: 2.1 ClType:WIm0

jvoot · June 20, 2018, 11:58am

Works like a charm RudiC! Thank you so much! Now I have to go try to figure out how it works.