I have a question about changing how parsing occurs currently for us:
input FILE123
TAGA01: 01
TAG02: daadsf
TAG03: adfasdf
TAGBBB04: 35
TAG05: asdfa
TAG07: adfd
TAG07: adfa3
TAG07: 234234
TAGCC08: 3525df
TAG09: adsfa
TAG10: 245
TAG11: nnnn
EOR:
TAGA01: 02
TAG02: abas
TAG03: asdfasd
TAGBBB04: E
TAG05: asdfasd
TAG07: acvasc
TAG07: czcvc
TAG07: 22
TAGCC08: adsfasd
TAG09: Y
TAG11: yyyy
EOR:
.
.
.
Note that some tags may not be in a record, and some tags may repeat in the same record.
I need to covert to the following inline format (limiter doesn't matter, and I can change it should the data include the limiter in other files) and trim it so the tag doesn't appear:
Format:
TAGA01 TAGCC08 TAGBBB04 TAG09 TAG11
output.file:
01 3535df 35 adsfa nnnn
02 adsfasd E Y yyyy
.
.
.
Here is what is used currently (from memory, so the syntax isn't correct but the idea is):
cat FILE123 | egrep "^TAGA01 ^TAGBBB04 ^TAGCC08 ^TAG09 ^TAG11" | awk -F. -f awkfile.awk > output.file
where awkfile.awk contains if statements and a printf output statement (again, syntax along with substring numbers are not correct - but the idea is there):
if ($1==TAGA01) {pTAGA01=substr($1,3)}
.
.
.
if ($1==TAG11) {
pTAG11=substr($1,4)
printf pTAGA01 ... pTAG11
}
I wanted to see different ideas for two reasons: one to see if this could be more efficient since every tag gets multiple ifs every time, and just to straight up learn something new.
Thanks for your time!