find pattern matches in consecutive lines in certain fields-awk

I have a text file with many thousands of lines, a small sample of which looks like this:

InputFile:

PS002,003 D                  -1   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
PS002,003 PSQ                 0   1  7 18  1  0 -1    1  1  3 -1    -1   1   1  -1      -1      -1      -1    0  501     0
PS002,003 XNQ                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS002,003 HWN=                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   2   2      -1      -1      -1    0  503     0
           * 0 -1 512 1 411 0 0 .q 4 LineNr 5 ClauseNr 1: 1: 3: 131: 0 0 SentenceNr 3 TxtType: ?Q      Pargr: 12 ClType:xYq0
           * 0 -2 111 1 411 0 0 .. 3 LineNr 10 ClauseNr 1: 1: 4: 131: 0 0 SentenceNr 6 TxtType: ?       Pargr: 1 ClType:xYq0
PS002,005 W                   0   6 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
PS002,005 B                   0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
PS002,005 XM>                 0   2 -1 -1 -1 11 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS002,005 H                   0   7 -1 -1 -1 -1 -1   -1  3  1  2    -1   7   5   2      -1      -1      -1    0  505     0
PS002,005 DLX                 0   1  5 18  1  0 -1    1  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
PS002,005 >NWN                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   7   2      -1      -1      -1    0  503     0
PS012,004 >BD                 0   1  5 15  1  0 -1    1  3  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
PS012,004 MRJ>                0   3 -1 -1 -1  1 -1   -1 -1  0  0     2   3   3   2      -1      -1      -1    0  502     0
PS012,004 KL                  0   2 -1 -1 -1  1 -1   -1 -1  0  0     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS012,004 HJN                 0   7 -1 -1 -1 -1 -1   -1  3  3  1    -1   7   2   2      -1      -1      -1    0  503     0
PS012,004 SP>                 0   2 -1 -1 -1 12 -1   -1 -1  3  1     3   2   0  -1      -1      -1      -1   -1   -1    -1
PS012,004 PLG                 0   1  6 18  1 12 -1   62 -1  3  1     3  13  -2   2      -1      -1      -1  -11  500     0

What I would like to do is that if a given line meets the conditions $16=="0" && $22=="-1" and the immediately following line has $22=="503" && $4=="7" && $16=="2" then print every set of these two consecutive lines.

Desired Output:

PS002,003 XNQ                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS002,003 HWN=                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   2   2      -1      -1      -1    0  503     0
PS012,004 KL                  0   2 -1 -1 -1  1 -1   -1 -1  0  0     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS012,004 HJN                 0   7 -1 -1 -1 -1 -1   -1  3  3  1    -1   7   2   2      -1      -1      -1    0  503     0

Thus far I have tried various revisions of the following awk code which has gotten me fairly close:

awk '$16=="0" && $22=="-1"{f=$0; f++; next} $22=="503" && $4=="7"{n=$0} {print f"\n"n}' InputFile

Nevertheless, I continue to not be able to figure out how to get this to work. I would very much appreciate any help to get this one-liner to work as desired. Thanks!

I'm not getting exactly your desired output, but something to start with:

awk '$16==0 && $22==-1 {l=$0;next} l && $22==503 && $4==7 {print l ORS $0;l=""}' myFile

produces:

PS002,003 XNQ                 0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS002,003 HWN=                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   2   2      -1      -1      -1    0  503     0
PS002,005 XM>                 0   2 -1 -1 -1 11 -1   -1 -1  1  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS002,005 >NWN                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   7   2      -1      -1      -1    0  503     0
PS012,004 KL                  0   2 -1 -1 -1  1 -1   -1 -1  0  0     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS012,004 HJN                 0   7 -1 -1 -1 -1 -1   -1  3  3  1    -1   7   2   2      -1      -1      -1    0  503     0
1 Like

Thank you so much vgersh99. Indeed, in the verbal parameters of my desired output I left out one condition (viz., $16=="2"). I have since fixed my original post. Thanks for pointing that out to me and the help with the code.

A slight adjustment to the code you offered reached the desired output as indicated in the original post.

awk '$16==0 && $22==-1 {l=$0;next} l && $22==503 && $4==7 && $16=="2"{print l ORS $0;l=""}' InputFile
1 Like

I always prefer to have a state variable and a store variable.

awk '
met==1 && $22=="503" && $4=="7" && $16=="2" {print save; print }
{ met=0 }
$16=="0" && $22=="-1" { save=$0; met=1 }
'

The { met=0 } clears the state, in order to only continue the search in the immediately following line.
The order 2. condition then 1. condition saves a next .

1 Like

What behavior do you want with the following input file?

PS012,004 SP>                 0   2 -1 -1 -1 12 -1   -1 -1  3  1     3   2   0  -1      -1      -1      -1   -1   -1    -1
PS012,004 PLG                 0   1  6 18  1 12 -1   62 -1  3  1     3  13  -2   2      -1      -1      -1  -11  500     0
PS012,004 HJN                 0   7 -1 -1 -1 -1 -1   -1  3  3  1    -1   7   2   2      -1      -1      -1    0  503     0
1 Like

In this case @MadeInGermany, I would not want any output to be generated given your proposed input file.

Maybe it would be helpful if I gave some more sample data from my input.

InputSample

        * 0 -4 110 1 511 0 0 .. 4 LineNr 11 ClauseNr 1: 1: 3: 111: 0 0 SentenceNr 5 TxtType: Q       Pargr: 1 ClType:ZYqX
PS016,004 D                  -1   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   6   6  -1      -1      -1      -1    0  509     0
PS016,004 L>                  0  11 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1  11  11  -1      -1      -1      -1    0  510     0
PS016,004 NQJ                 0   1  4 18  1  0 -1    1  1  1 -1    -1   1   1  -1      -1      -1      -1    0  501     0
PS016,004 NWQJ                0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS016,004 HWN=                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   2   2      -1      -1      -1    0  503     0
PS016,004 MN                  0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
PS016,004 DM                  0   2 -1 -1 -1  1 -1   -1 -1  0  2     3   2   5   2      -1      -1      -1    0  505     0           
        * 0 -1 620 0 0 .. 10 LineNr 14 ClauseNr 1: 1: 4: 132: 0 0 SentenceNr 12 TxtType: Q       Pargr: 1 ClType:xQt0
PS017,005 SMK                 0   1  0 18 11  0 -1    2  2  1  2    -1   1   1  -1      -1      -1      -1    0  501     0
PS017,005 HLK>                0   2 -1 -1 -1 12 -1   -1 -1  3  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS017,005 J                   0   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0
PS017,005 B                   0   5 -1 -1 -1 -1 -1   -1 -1 -1 -1    -1   5   0  -1      -1      -1      -1   -1   -1    -1
PS017,005 CBJL                0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS017,005 K                   0   7 -1 -1 -1 -1 -1   -1  2  1  2    -1   7   5   2      -1      -1      -1    0  504     0
        * 0 -3 122 1 11 0 0 .. 8 LineNr 15 ClauseNr 1: 1: 3: 102: 0 0 SentenceNr 13 TxtType: Q       Pargr: 1 ClType:ZQt0

Desired Output

PS016,004 NWQJ                0   2 -1 -1 -1  5 -1   -1 -1  3  2     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS016,004 HWN=                0   7 -1 -1 -1 -1 -1   -1  3  3  2    -1   7   2   2      -1      -1      -1    0  503     0
PS017,005 HLK>                0   2 -1 -1 -1 12 -1   -1 -1  3  1     1   2   0  -1      -1      -1      -1   -1   -1    -1
PS017,005 J                   0   7 -1 -1 -1 -1 -1   -1  1  1 -1    -1   7   2   2      -1      -1      -1    0  503     0

Thus, when there is a line that meets the conditions:

$16=="0" && $22=="-1"

Check the immediately following line to see if it has the conditions:

$4=="7" && $16=="2" && $22=="503"

If both of these are met, then print both lines to output; else do nothing. Ideally, I would like to be able to use the code help I receive here as a kind of template to vary the conditions on the various fields to extract a wide range of data patterns. I attempted the code that you offered and while I have not checked its accuracy in detail due to the size of the output, it seems that at a first glance it worked perfectly.