I will start with an example of what I'm trying to do and then describe how I am approaching the issue.
File
PS028,005 [JHRS-<Pr>] [ABC <Ob>]
Lexeme HRS # M #
PhraseType 1(1:1) 7(7)
PhraseLab 501[0] 503[0]
ClauseType ZYq0
PS028,005 [W-<Cj>] [L> <Ng>] [JBN-<Pr>] [XYZ <Ob>]
Lexeme W # L> # BNH # M #
PhraseType 6(6) 11(11) 1(1:1) 7(7)
PhraseLab 509[0] 510[0] 501[0] 503[0]
ClauseType WxY0
Desired Output
PS028,005 ABC
PS028,005 XYZ
I would also be happy with the following where I can strip things off by piping into sed
:
PS028,005 [ABC <Ob>]
PS028,005 [XYZ <Ob>]
In essence, when a line begins with /^ PS/ then print $1 of that line along with the string between strings "[" and "<Ob>]". I can use sed
to get the string between "[" and "<Ob>]" but I cannot get $1 (when $1 ~/^ PS/) to print along with it.
I have attempted:
awk '/^ PS/{print $1, $(/\[.*\<Ob\>\]/)}' File
Here I am attempting to use a nonconstant field number, however this seems to print the entire line containing the matching string in question.
Another attempt has been this:
awk '/^PS/{a = $1; $2 = /\[.*\<Ob\>\]/}{print a,$2}' File
Finally I have tried utilize an array, and must admit that even after reading the man awk
page, I still find these confusing.
awk 'BEGIN{a[NR]=$0}{if(/\[.*\<Ob\>\]/ in a && $1 ~/^ PS/) print}' File
Obviously, none of these has worked. I would greatly appreciate any help on what should be a relatively easy bit of code that I'm just not getting. Thanks in advance.