Parse A Log File

Ariean · May 8, 2014, 11:18am

Hello All,

Below is the excerpt from my Informatica log file which has 4 blocks of lines (starting with WRITER_1_*_1). Like these my log file will have multiple blocks of same pattern.

WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Thu May 08 09:33:21 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1001040    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Thu May 08 09:33:25 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1101144    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Thu May 08 09:33:27 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1201248    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Thu May 08 09:33:30 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1301352    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

From this i need to parse it and get 2 lines from each block as shown below. These 2 lines are related to XMLTgt_FCSLoans25::X_fc_Loan, basically
what i am trying to acheive is number of rows requested only for Loan.

PARSE 1:

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1001040    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1101144    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1201248    Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1301352    Applied: 0          Rejected: 0          Affected: 0

PARSE 2:

my idea is come up with a script which executes every 15 seconds and get these values (parse 2) and insert into a table from which Java UI program would read and display it as a progress bar in front end for users.

Appreciate your response, please find attached complete log file.

Thank you.

in2nix4life · May 8, 2014, 11:29am

A simple approach the parse the file if it's fixed as you displayed above:

 awk '/fc_Loan/{getline;print $6}'

vgersh99 · May 8, 2014, 11:31am

making some assumptions based on the sample....
parse1:

awk '$1 ~ "^WRITER_" {p=3;next} p && p-- && !p' RS= myFile

parse2:

awk '$1 ~ "^WRITER_" {p=3;next} p && p-- && !p {split($2,a,OFS); print a[6]}' RS= FS='\n' myFile

Ariean · May 8, 2014, 11:59am

I did this

awk '$1 ~ "^WRITER_" {p=3;next} p && p-- && !p {split($2,a,OFS); print a[6]}' RS= FS='\n' s_GenerateXMLDataFile.log

Output:

vgersh99 · May 8, 2014, 12:09pm

your attached file has nothing to do the data sample quoted in the original posting.
The solution works for the sample data given.
Please provide a representative sample.

Ariean · May 8, 2014, 1:18pm

As i mentioned i took the excerpt from the log file attached as i don't want to paste the whole content of log file as it looks messy. The attached log file is the one I want to parse, i really appreciate your inputs.

Thank you

vgersh99 · May 9, 2014, 10:02am

There's no leading WRITER_ in the attached file. That was the key that you mentioned in your original post.
If your attached file IS the file to process, please take samples lines out of this file and show/explain how it should be processed and what output you're after.

This needs to be clarified so that we don't go in circles trying to solve a phantom/moving task.

Ariean · May 9, 2014, 12:06pm

Apologies for the confusion, interesting though looks like when i downloaded the log (which i attached) from Informatica client tools it is showing like below with timestamp and some extra info.

2014-05-08 09:33:14 : INFO : (29223 | WRITER_1_*_1) : (IS | Dev_Int) : Dev_Node_01 : WRT_8161 : 
TARGET BASED COMMIT POINT  Thu May 08 09:33:13 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 700728     Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

but when i vi the log in Linux server it is showing like below.

WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Thu May 08 09:33:13 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 700728     Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

since i would be executing my script in Linux server, we should choose the vi version of the log output. When i execute your peice of code against the log file in Linux, the output i got is as shown in my previous post, we need to get rid of the word 'this' in the output.

I actually winscpied the log file from Linux server and attached it for your reference please take a look.

Thank you.

vgersh99 · May 9, 2014, 12:15pm

a simplified version - see if it helps - the samples are kind of small in scope...:

awk '$1 ~ "^WRITER_" {p=1;next} p&&/X_fc_Loan/{p++;next}; p==2{print $6;p=0}' myFile

Ariean · May 9, 2014, 12:17pm

I actually winscpied the log file from Linux server and attached it for your reference please take a look.

vgersh99 · May 9, 2014, 12:25pm

ok, that's better - a lil bit more of error checking:

awk '$1 ~ "^WRITER_" {p=1;next} p&&/X_fc_Loan/{p++;next}; p==2 && NF>=6 && $6 !~/[^0-9]/{print $6;p=0}' s_GenerateXMLDataFile.txt

Ariean · May 9, 2014, 12:52pm

Thank you it worked, would you mind explaining splitting your awk script into chunks and how you are achieving the desired output.

vgersh99 · May 9, 2014, 1:21pm

awk '
# if the first field starts with WRITER_, set the flag p to 1 and skip to processing the next line
$1 ~ "^WRITER_" {p=1;next}

# if flag p is not 0 AND there's a string X_fc_Loan on current line, increment flag p and skipp to processing the next line
p&&/X_fc_Loan/{p++;next}

# if flag p is equal to 2 (already saw WRITER_ and X_fv_Loan lines) AND NumberOfFields (NF) is greater than 6 AND the 6-th field contains ONLY numbers (the last 2 conditions filter out non-complete lines AND lines where the 6-th field is not numeric), print the 6-th field and reset flag p to 0.
# Example of the invalid lines:
#WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
#WRT_8044 No data loaded for this target

p==2 && NF>=6 && $6 !~/[^0-9]/{print $6;p=0}
' s_GenerateXMLDataFile.txt

not that we have it all squared away, could I get a discount for a new Bimmer?

Ariean · May 22, 2014, 11:30am

Hello,
For example if the log file has below lines, it is printing the value beneath the X_fc_Customer when X_fc_Loan has "WRT_8044 No data loaded for this target" which it shouldn't have printed, how do i restrict to print only the numbers below X_fc_Loan??

FYI...I modified your command little bit as shown below (print $6 instead of print $12)

awk '$1 ~ "^WRITER_" {p=1;next} p&&/X_fc_Loan/{p++;next}; p==2 && NF>=6 && $6 !~/[^0-9]/{print $6;p=0}' s_GenerateXMLDataFile.log.467

logfile excerpt:

WRITER_1_*_1> WRT_8168 End loading table [XMLTgt_FCSLoans25::X_fc_Customer] at: Tue May 20 12:35:24 2014
WRITER_1_*_1> WRT_8141
Commit on end-of-data  Tue May 20 12:35:24 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8044 No data loaded for this target


WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0

WRITER_1_*_1> WRT_8165 TIMEOUT BASED COMMIT POINT
READER_1_3_1> RR_4050 First row returned from database to reader : (Tue May 20 12:36:54 2014)
WRITER_1_*_1> WRT_8167 Start loading table [XMLTgt_FCSLoans25::X_fc_Loan] at: Tue May 20 12:36:54 2014
WRITER_1_*_1> WRT_8161
TARGET BASED COMMIT POINT  Tue May 20 12:36:56 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 100320     Applied: 0          Rejected: 0          Affected: 0

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0
                                                                                                                                           467,1         Bot

Output:

It should not have printed the numbers above 100320 which belongs to X_fc_Customer, please help
Thank you.

---------- Post updated 05-22-14 at 11:30 AM ---------- Previous update was 05-21-14 at 05:11 PM ----------

Can some one please throw any ideas. i am doing the below but unsuccessfull.

awk '$1 ~ "^WRITER_" {p=1;next} p&&/X_fc_Loan/{p++;next}; p==2 && NF>=6 && $1 !~/[^WRT_8044]/ && $6 !~/[^0-9]/{print $6;p=0}' s_GenerateXMLDataFile.log.467