Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help.
I have the following sample from textfile with many thousands of lines: File 1
PS001,001 HLK
PS002,004 L<G
PS004,002 XNN
PS004,006 BVX
PS004,006 ZBX=
PS005,007 DBR=
PS005,011 MRH
PS005,012 XSH
PS006,003 RP>
PS006,003 XNN
PS006,010 LQX
PS007,002 XSH
PS009,011 BVX
I have another large text file with many lines such as this: File 2
* 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
PS004,002 <NH 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
* 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
PS004,002 XNN 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
PS004,006 ZBX= 0 1 1 0 7 -1 -1 3 2 3 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,006 ZBX 0 2 -1 -1 -1 5 -1 -1 -1 3 2 1 2 0 -1 2 -1 -1 -1 -1 -1
PS004,006 YDQ 0 2 -1 -1 -1 1 -1 -1 -1 1 2 2 2 2 1 -10002 -1 -1 0 503 0
* 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
* 0 -2 523 1 122 0 0 .. 3 LineNr 32 ClauseNr 1: 1: 4: 142: 0 0 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:xQtX
PS006,010 CM< 0 1 0 0 1 -1 -1 2 3 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS006,010 JHWH 0 3 -1 -1 -1 1 -1 -1 -1 1 2 2 3 3 2 -1 -1 -1 0 502 0
PS006,010 TXNH 0 2 -1 -1 -1 3 -1 -1 -1 1 1 1 2 0 -1 -1 -1 -1 -1 -1 -1
PS006,010 J -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 2 2 -1 -1 -1 0 503 0
* 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
* 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
PS006,010 JHWH 0 3 -1 -1 -1 1 -1 -1 -1 1 2 2 3 3 2 -1 -1 -1 0 502 0
PS006,010 TPLH 0 2 -1 -1 -1 3 -1 -1 -1 1 1 1 2 0 -1 -1 -1 -1 -1 -1 -1
PS006,010 J -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 2 2 -1 -1 -1 0 503 0
PS006,010 LQX 0 1 2 0 1 -1 -1 1 3 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
* 0 -1 112 0 0 .. 5 LineNr 34 ClauseNr 3: 1: 3: 121: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:XYqt
My desire is that when $1 && $2 of File 1 match $1 && $2 of File 2 and that match is between lines beginning with "" and also has $22=="503" in that same group of lines between "", then print. So:
* 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
PS004,002 XNN 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
* 0 -1 103 0 0 m. 7 LineNr 23 ClauseNr 1: 1: 1: 304: 0 0 SentenceNr 13 TxtType: Q Pargr: 2.1 ClType:MSyn
PS004,006 ZBX= 0 1 1 0 7 -1 -1 3 2 3 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,006 ZBX 0 2 -1 -1 -1 5 -1 -1 -1 3 2 1 2 0 -1 2 -1 -1 -1 -1 -1
PS004,006 YDQ 0 2 -1 -1 -1 1 -1 -1 -1 1 2 2 2 2 1 -10002 -1 -1 0 503 0
* 0 -3 200 1 201 0 0 .. 5 LineNr 24 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 14 TxtType: Q Pargr: 2.1 ClType:ZIm0
* 0 -1 122 1 112 0 0 .. 4 LineNr 33 ClauseNr 2: 1: 3: 112: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:ZQtX
PS006,010 JHWH 0 3 -1 -1 -1 1 -1 -1 -1 1 2 2 3 3 2 -1 -1 -1 0 502 0
PS006,010 TPLH 0 2 -1 -1 -1 3 -1 -1 -1 1 1 1 2 0 -1 -1 -1 -1 -1 -1 -1
PS006,010 J -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 2 2 -1 -1 -1 0 503 0
PS006,010 LQX 0 1 2 0 1 -1 -1 1 3 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
* 0 -1 112 0 0 .. 5 LineNr 34 ClauseNr 3: 1: 3: 121: -6 -11 SentenceNr 17 TxtType: Q Pargr: 2.1 ClType:XYqt
My current tactic was to take File 2 and print only matches between "*" that have $22=="503"
gawk '{BUF = BUF ORS $0} $22=="503"{PRT=1}/^ *\*/{if(PRT) print BUF; BUF=$0; PRT=DL=""}' File 2
Then I was taking File 1 iterating over the previous output to find matches:
gawk 'FNR==NR{a[$1]; next} ($1) in a || $0 ~/\*/' File 1 <(awk '{BUF = BUF ORS $0} $22=="503"{PRT=1}/^ *\*/{if(PRT) print BUF;BUF=$0; PRT=DL=""}' File2)
However, this method produces many false matches because the search criteria ($1 of File 1) is too ambiguous to match the specific matches I need. If I include the other field in the search criteria of File 1, it becomes too specific and will not include the surrounding lines.
So for example, given a hypothetical:
File 1a
PS004,002 XNN
File 2a
* 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
PS004,002 <NH 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
* 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
PS004,002 XNN 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
My sample code gives:
* 0 1 55 0 0 .\ 1 LineNr 4 ClauseNr 1: 1: 2: 104: 505 11 SentenceNr 1 TxtType: Q Pargr: 2 ClType:InfC
PS004,002 <NH 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -1 55 1 103 2 123 3 200 0 0 .N 0 LineNr 5 ClauseNr 2: 1: 2: 133: 0 0 SentenceNr 1 TxtType: Q Pargr: 2 ClType:ZIm0
* 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
PS004,002 XNN 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
Rather than the desired:
* 0 -2 123 0 0 .. 1 LineNr 7 ClauseNr 1: 1: 3: 132: 0 0 SentenceNr 2 TxtType: Q Pargr: 2 ClType:xQt0
PS004,002 XNN 0 1 1 0 1 -1 -1 3 2 1 2 -1 1 1 -1 -1 -1 -1 0 501 0
PS004,002 NJ -1 7 -1 -1 -1 -1 -1 -1 1 1 -1 -1 7 7 2 -1 -1 -1 0 503 0
* 0 -3 200 1 201 2 103 18 163 22 123 0 0 .. 0 LineNr 8 ClauseNr 1: 1: 2: 103: 0 0 SentenceNr 3 TxtType: Q Pargr: 2 ClType:ZIm0
Thanks so much and sorry for the lengthy post. Hopefully I have described this accurately.