Extracting from log file

Hi All,

I have a huge log file where user information such as name,address,point balance etc are stored.
I need to extract only point balance,first name,last name only.
How to achieve this, tried with awk and jq but could not get the result.

Log file

app2.hostname/log.2017-08-05.gz:[2017-08-05 11:43:42,508] app2.hostname 1501947827514 NA:NA:NA http-nio-8080-exec-1 INFO  Rest Response: 200  [Headers: {X-Application-Context=[application:9015], Content-Type=[application/json;charset=UTF-8], Transfer-Encoding=[chunked], Date=[Sat, 05 Aug 2017 15:43:42 GMT]}] {"accountStatus":{"varStatusInfo":{},"accessType":"READ_ONLY"},"accountBalance":{"pointsBalance":111834},"userInformation":{"firstName":"xx","lastName":"xx","address":{"line1":"14 KETTLEWELL WAY","line2":"","city":"OTTAWA","stateCode":"ON","postalCode":"K2W1G3","countryCode":"CA"},"phoneNumbers":[{"type":"OTHER","number":"xx"}],"emailAddresses":[{"type":"OTHER","email":"xx@gmail.com"}],"additionalInfo":{"SourceCode":"2"}}} (LoggingRestRequestResponseInterceptor)

app10/log.2017-07-01.gz:[2017-07-01 00:20:56,836] app10.hostname 1498882864809 NA:NA:NA http-nio-8080-exec-23 INFO  Got successful response for the url GET http://hostname:8080/sss/accounts/999999. Response: {"accountBalance":{"pointsBalance":29878},"accountStatus":{"accessType":"READ_ONLY","varStatusInfo":{}},"userInformation":{"additionalInfo":{"SourceCode":"2"},"address":{"line1":"14879 86 AVENUE","line2":"","city":"yy","stateCode":"BC","postalCode":"V3S7E6","countryCode":"CA"},"emailAddresses":[{"email":"yy@gmail.com","type":"OTHER"}],"firstName":"yy","lastName":"yy","phoneNumbers":[{"number":"77777","type":"OTHER"}]}} (HttpClientUtil)

Please show your failed attempts.

This is what tried.

Step1: Converted the large log file into json file as below

awk -F ' Response:|\\(HttpClientUtil)|' '{print $2}' logfile >> logfile.json

Step 2:From the json file using jq extracted pointbalance,firstname and last name

cat logfile.json | jq . | egrep "pointsBalance|firstName|lastName"

Output:

"pointsBalance": 60153
    "firstName": "BETTY",
    "lastName": "BROOKS",
    "pointsBalance": 9870
    "firstName": "ROSS",
    "lastName": "MULLEN",

The problem here is there is anther text in the log file named named [(LoggingRestRequestResponseInterceptor)] . I am not able to get the fields in that pattern using the step 1.

Step 1 scans and takes all the fields which is in between (HttpClientUtil) .

If i can combine the second pattern [(LoggingRestRequestResponseInterceptor)] in the step 1 i believe it will be outputing all the required fields.
This is where struggling.
---------- Post updated at 03:38 AM ---------- Previous update was at 01:41 AM ----------

Tried the below code in getting the fields between the keyword (LoggingRestRequestResponseInterceptor)

awk -F "{|}" '{print $9} {print $11}' logfile

Output

"pointsBalance":111834
"firstName":"KERRI","lastName":"MCGUIRE","address":
"additionalInfo":
,"address":

Then tried to combine both the awk statement,but still not getting the required output.

awk -F "{|}" '{print $9} {print $11}'| awk -F ' Response:|\\(HttpClientUtil)|' '{print $2}' logfile

Wow.

Although this modifies and extends the specs in post#1, it doesn't make things much clearer. And, your code samples don't really help...

Wildly guessing / assuming that

  • every record in in one single line
  • "in between" means "in the same line"
  • you want records with either keyword
  • the keyword doesn't need to be listed in the output
    , would this come close to what you need?
grep -E "LoggingRestRequestResponseInterceptor|HttpClientUtil" file | grep -Eo ".((first|last)Name|pointsBalance)[^,}]*"
"pointsBalance":111834
"firstName":"xx"
"lastName":"xx"
"pointsBalance":29878
"firstName":"yy"
"lastName":"yy"

If it doesn't, please become way more specific with your description of the problem.

1 Like

Sorry , if it confused. This is exactly the requirement.And it worked .Thank you.
Could you please tell what the below code does.
The period at the beginning and parameters inside the square bracket

grep -Eo ".((first|last)Name|pointsBalance)[^,}]*

man regex :

As the field values follow the field name and are terminated by either a comma or a right brace, that regex part allows for all chars EXCEPT those two and is terminated by either.

The period at the beginning allows for any char, esp. the double quote for the field names. It's a lazy way to do so; you could also specify an escaped double quote.

Thank You.:slight_smile: