I am hoping to pull multiple strings from one file and use them to search within a block of text within another file.
File 1
PS001,001 HLK
PS002,004 MWQ
PS004,002 RXM
PS004,006 DBX
PS004,006 SBR
PS005,007 ML
PS005,009 DBR
PS005,011 MR
PS005,012 SBR
PS006,003 RXM
PS006,003 >SJ
PS006,010 QBL
File 2
PS001,001 [VWB-WHJ <Su>] [L-GBR> <PC>]
Lexeme VWB HJ== # L GBR #
PhraseType 2(2.1,7) 5(5,2.3)
PhraseLab 502[0] 521[0]
ClauseType NmCl
PS001,001 [D-<Re>] [B->WRX> D-<WL> <Co>] [L> <Ng>] [HLK <Pr>]
Lexeme D # B >WRX D <WL # L> # HLK #
PhraseType 6(6) 5(5,2.3,5,2.3) 11(11) 1(1:2)
PhraseLab 519[0] 504[0] 510[0] 501[0]
ClauseType xQt0
PS002,004 [W-<Cj>] [MRJ> <Su>] [NMJQ <Pr>] [B-HWN <Co>]
Lexeme W # MRJ> # MWQ # B HWN= #
PhraseType 6(6) 3(3.2) 1(1:1) 5(5,7)
PhraseLab 509[0] 502[0] 501[0] 504[0]
ClauseType WXYq
PS002,005 [HJ DJN <Mo>] [NMLL <Pr>] [<LJ-HWN <Co>] [B-RWGZ-H <Aj>]
Lexeme HJ= DJN= # ML # <L HWN= # B RWGZ H #
PhraseType 4(8,4) 1(1:1) 5(5,7) 5(5,2.1,7)
PhraseLab 508[0] 501[0] 504[0] 505[0]
ClauseType xYq0
PS005,012 [D-<Re>] [MSBRJN <PC>] [B-K <Co>]
Lexeme D # SBR # B K #
PhraseType 6(6) 1(1:6.2) 5(5,7)
PhraseLab 519[0] 521[0] 504[0]
ClauseType Ptcp
PS005,012 [W-<Cj>] [L-<LM <Ti>] [NCBXWN-<Pr>] [K <Ob>]
Lexeme W # L <LM # CBX # K #
PhraseType 6(6) 5(5,2.2) 1(1:1) 7(7)
PhraseLab 509[0] 506[0] 501[0] 503[0]
ClauseType WxY0 PS005,013 [>JK SKR> MQBLT> <Aj>] [T<VP-<Pr>] [NJ <Ob>]
Lexeme >JK SKR QBL # <VP # NJ #
PhraseType 5(5,2.3,13:62.3) 1(1:1) 7(7)
PhraseLab 505[0] 501[0] 503[0]
ClauseType xYq0
PS006,002 [MRJ> <Vo>]
Lexeme MRJ> #
PhraseType 3(3.2)
PhraseLab 562[0]
ClauseType Voct
PS006,002 [L> <Ng>] [B-RWGZ-K <Aj>] [TKS-<Pr>] [NJ <Ob>]
Lexeme L> # B RWGZ K # KS # NJ #
PhraseType 11(11) 5(5,2.1,7) 1(1:1) 7(7)
PhraseLab 510[0] 505[0] 501[0] 503[0]
ClauseType xYq0
My hope was that when $1 of File 1 matches $1 in File 2, $0 in File 2 contains the string "<Co>", and $2 of File 1 matches a string *exactly* in File 2 on a line beginning with the word "Lexeme," then print.
Thus, my desired output would look like this:
PS001,001 [D-<Re>] [B->WRX> D-<WL> <Co>] [L> <Ng>] [HLK <Pr>]
Lexeme D # B >WRX D <WL # L> # HLK #
PhraseType 6(6) 5(5,2.3,5,2.3) 11(11) 1(1:2)
PhraseLab 519[0] 504[0] 510[0] 501[0]
ClauseType xQt0
PS002,004 [W-<Cj>] [MRJ> <Su>] [NMJQ <Pr>] [B-HWN <Co>]
Lexeme W # MRJ> # MWQ # B HWN= #
PhraseType 6(6) 3(3.2) 1(1:1) 5(5,7)
PhraseLab 509[0] 502[0] 501[0] 504[0]
ClauseType WXYq
PS005,012 [D-<Re>] [MSBRJN <PC>] [B-K <Co>]
Lexeme D # SBR # B K #
PhraseType 6(6) 1(1:6.2) 5(5,7)
PhraseLab 519[0] 521[0] 504[0]
ClauseType Ptcp
With the following code I am able to am able to do two of the three criteria listed above, namely, I am able to match $1 of File1 with $1 of File2 and also when $0 of File 1 has the string "<Co>". However, I am having difficulty with the last criteria, viz., match $2 of File 1 with the exact string in File 2 when the lines begins with "Lexeme."
NR==FNR {A[$1]
B[$2]
next
}
/^ Cl/ {if (PR1 && PR2 && PR3) {print"\n" BUF
print
}
PR1 = PR2 = PR3 = 0
BUF = ""
next
}
{BUF = BUF (BUF?ORS:_) $0
if ($1 in A) PR1 = 1
if ($0 ~/\<Co\>/) PR2 = 1
for (b in B) if($0 ~ b) PR3 = 1
}
I have also tried:
NR==FNR {A[$1]
B[$2]
next
}
/^ Cl/ {if (PR1 && PR2 && PR3) {print"\n" BUF
print
}
PR1 = PR2 = PR3 = 0
BUF = ""
next
}
{BUF = BUF (BUF?ORS:_) $0
if ($1 in A) PR1 = 1
if ($0 ~/\<Co\>/) PR2 = 1
if ($1 ~/ ^L/ && $0 in B) PR3 = 1
}
I think there might be something wrong with the way that I'm defining the "B" array with $2 of File 1 or defining the "for" loop in the script. Thank you so much in advance for your help.