Eliminate or ignore asterisks in data when parsing

SkySmart · March 31, 2017, 4:51pm

I have data file that has this in it:

data.txt

.........
.........
PPJ97**2017PPJ97**2017-03-21-13.35.15.887208********************START ERROR LOGGING******************
PPJ97**2017-03-21-13.35.15.887208** PROMPT APPLICATION ERROR **
PPJ97**2017-03-21-13.35.15.887208** IN TIMESTAMP       | 2017-03-21-13.35.15.887208
PPJ97**2017-03-21-13.35.15.887208** OUT TIMESTAMP      | 2017-03-21-13.35.15.896223
PPJ97**2017-03-21-13.35.15.887208** RETURN CODE        | 08
PPJ97**2017-03-21-13.35.15.887208** ERROR KEY          | ed7-371-47e-a4-fff2ce|838101733 965079 M
PPJ97**2017-03-21-13.35.15.887208** ERROR MESSAGE      | MISMATCH IN PREPACK TYPE
.........
.........

I use the following code to only pull out sections of the log if and only if they contain 3 different patterns:

awk -v p1="START.*ERROR" -v p2="PROMPT.*APPLICATION.*ERROR" -v p3="ERROR.*KEY" ' s!="" { s=s RS $0 ; if($0~p3) { if (s~p2) print s ; count++ ; s="" } } ; $0~p1 { s=$0 } END {print count} ' data.txt

this code works if i alter the data.txt file and get rid of all asterisks "*" and pipes "|" since these can be mis-interpreted by the system or awk command.

however, since i wont be able to alter the data.txt file in a real world scenario (i dont have permissions to), im looking for a way to have my awk command do that before parsing. or of there's a way to have awk ignore the asterisks and pipes altogether, that'll be great.

Don_Cragun · March 31, 2017, 5:04pm

Instead of just showing us your code, please explain what you are trying to do.

In what way are asterisks and vertical bars in your input causing problems?

Counting the number of lines that match one or more of your three patterns is easy. What is the purpose of combining input lines into the string s ?

What output are you hoping to produce from your sample input?