Code to get unique values

Hello All,

I am trying to write a script which returns me clientID,programId,userID indicated in bold from the below log files.Log file is having many such data , iam just presenting sample .

Sample Log file.

[2018-05-09 08:01:24,251] hostname 1525867288264 UA:MP:EP491418 http-nio-8080-exec-11 ERROR Get Price Failed! EP491418 , UA , MP (AbstractLocal
PricingServiceV1)
[2018-05-09 08:01:24,251] hostname 1525867288264 UA:MP:EP491418 http-nio-8080-exec-11 ERROR Exception while search for products (CartService)
com.xx.xx.pricing.pricingapi.exception.InvalidCartRequestException: Error: Failed to calculate US Tax. Error Number: '2' Error Description: 'A valid city/state or zip code required.' | Params: Address={
  "line1" : "6 allee de Mauregard",
  "line2" : "",
  "city" : "Gif sur Yvette",
  "stateCode" : "Essonne",
  "postalCode" : "91190",
  "countryCode" : "US"
} taxType=Sales

Expected Output is

UA:MP:EP491418 
UA:MP:HR722540
UA:MP:EG359458

zGrep Pattern

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz  | zgrep "CartService" |sort --unique

Output from the above zgrep pattern

[2018-05-08 09:43:32,135] hostname 1525787013878 UA:MP:HR722540 http-nio-8080-exec-14 ERROR Exception while search for products (CartService)
[2018-05-08 09:43:41,636] hostname 1525787029223 UA:MP:HR722540 http-nio-8080-exec-45 ERROR Exception while search for products (CartService)
[2018-05-08 11:24:07,182] hostname 1525793054329 UA:MP:EG359458 http-nio-8080-exec-32 ERROR Exception while search for products (CartService)
[2018-05-08 11:24:26,796] hostname 1525793068993 UA:MP:EG359458 http-nio-8080-exec-45 ERROR Exception while search for products (CartService)
[2018-05-08 11:24:30,903] hostname 1525793073833 UA:MP:EG359458 http-nio-8080-exec-7 ERROR Exception while search for products (CartService)
[2018-05-08 11:24:54,760] hostname 1525793099189 UA:MP:EG359458 http-nio-8080-exec-5 ERROR Exception while search for products (CartService)
[2018-05-08 11:25:16,744] hostname 1525793124418 UA:MP:EG359458 http-nio-8080-exec-18 ERROR Exception while search for products (CartService)
[2018-05-08 11:25:21,622] hostname 1525793122821 UA:MP:EG359458 http-nio-8080-exec-2 ERROR Exception while search for products (CartService)

Please help in getting
1. Only the unique clientID,programId,userID. Am not familiar with sed for getting this result.

Any help is appreciated.

Hmmm - are you sure?

Hello nextStep,

Could you please try following and let me know if this helps.

awk 'match($0,/UA:MP:[a-zA-Z]+[0-9]+/) && !a[substr($0,RSTART,RLENGTH)]++{print substr($0,RSTART,RLENGTH)}'  Input_file

Thanks,
R. Singh

1 Like

I am sorry for that.Hit the send button quickly

---------- Post updated at 05:11 AM ---------- Previous update was at 05:07 AM ----------

Hello Ravinder,

That code worked, much thanks. Pasting the output below.

 zgrep -B1 "Failed to calculate US Tax" logFile  | zgrep "CartService" |awk 'match($0,/UA:MP:[a-zA-Z]+[0-9]+/) && !a[substr($0,RSTART,RLENGTH)]++{print substr($0,RSTART,RLENGTH)}'

UA:MP:HR722540
UA:MP:EG359458

Could you please explain the logic you have used above.

Here are some alternatives you could try:

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz | grep "CartService" | grep -Eo "[[:upper:]]{2}:[[:upper:]]{2}:[[:upper:]]{2}[[:digit:]]{6}" | sort -u

or

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz | sed -n '/CartService/s/.*\(..:..:........\).*/\1/p' | sort -u
gzcat log-gr_base.log.2018-05-08.gz | awk '/Failed to calculate US Tax/{print p}{p=$5}' | sort -u 

or

gzcat log-gr_base.log.2018-05-08.gz | awk '/Failed to calculate US Tax/ && !A[p]++{print p}{p=$5}' 

Hello nextStep,

Following is the explanation may help you on same.

awk '
match($0,/UA:MP:[a-zA-Z]+[0-9]+/) && !a[substr($0,RSTART,RLENGTH)]++{  ##Using match utility of awk to match REGEX UA:MP: alphabets till all occurrences then till all digits all occurrences AND(condition) to check if array a should NOT have index of substring of RSTART and RLENGTH variables then print substring of that line whose starting point is value of RSTART and final points RLENGTH.
   print substr($0,RSTART,RLENGTH)                                     ##Printing the substring whose value starts from RSTART to RLENGTH values.
}'  Input_file                                                                      ##Mentioning Input_file name here.
 

Thanks,
R. Singh

1 Like

Thanks.gzcat was not recogonized .Could you please help me to understand

sed -n '/CartService/s/.*\(..:..:........\).*/\1/p' | sort -u
grep -Eo "[[:upper:]]{2}:[[:upper:]]{2}:[[:upper:]]{2}[[:digit:]]{6}" 

Sometimes you need to use zcat instead of gzcat

The sed statement:

sed -n                         # -n: Do not automatically print records
/CartService/                  # Look for lines that contain "CartService"
s/.*\(..:..:........\).*/\1/p  # Discard everything but the last pattern on the line that looks like:
                               # 2 chars - colon - 2 chars - colon - 6 chars 
                               # (the .* gobbles up the first pattern on the line)
                               # \1 is a so-called back reference, it refers to the pattern within the escaped brackets
                               # So what it all does - in short - is look for "CartService" and then
                               # replace the entire line with the pattern and if it can do that, print the line 

The grep statement:
-E means use Extended Regular Expression.
-o means print only the matching part of the line (this option is non-standard and is not available in every grep utility).
[[:upper:]]{2}:[[:upper:]]{2}:[[:upper:]]{2}[[:digit:]]{6} means 2 uppercase letters - colon - 2 uppercase letters - colon - 2 uppercase letters + 6 digits.

Thanks a lot. Of the two i found [[:upper:]]more convenient and easy.

---------- Post updated at 06:49 AM ---------- Previous update was at 06:12 AM ----------

Hi Scrutinizer

Of the two methods, is it possible to get the date as well.There is a new requirement requested after seeing the output. Sorry for the late update.

2018-05-09 08:01:24,251

Hello nextStep,

Following awk may help you on same too.

awk 'match($0,/UA:MP:[a-zA-Z]+[0-9]+/) && !a[substr($0,RSTART,RLENGTH)]++{sub(/\[/,"",$1);sub(/\]/,"",$2);print $1,$2,substr($0,RSTART,RLENGTH-2)}'   Input_file

Adding a non-one liner form of solution too now.

awk '
match($0,/UA:MP:[a-zA-Z]+[0-9]+/) && !a[substr($0,RSTART,RLENGTH)]++{
   sub(/\[/,"",$1);
   sub(/\]/,"",$2);
   print $1,$2,substr($0,RSTART,RLENGTH-2)
}'   Input_file
 

Thanks,
R. Singh

1 Like

That also worked. Thanks a lot Ravinder.

You could try:

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz | grep "CartService" | grep -Eo '.{4}-.{2}-[^]]*|[[:upper:]]{2}:[[:upper:]]{2}:[[:upper:]]{2}[[:digit:]]{6}' 

or

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz | grep "CartService" | grep -Eo '.{4}-.{2}-[^]]*|.{2}:.{2}:.{8}'

The reason that a simple dot suffices in this case is because the first match will get matched by the other expression. The vertical bar (|) means "OR" (alternation).
[^]] means NOT "]"

--
To get both results on one line:

zgrep -B1 "Failed to calculate US Tax" log-gr_base.log.2018-05-08.gz | grep "CartService" | grep -Eo '.{4}-.{2}-[^]]*|.{2}:.{2}:.{8}' | paste - - 
1 Like

Thanks Scrutinizer

If you want to also sort -u, some sorts can do this (but also that is not standard):

sort -u -k3,3

But your best bet might be:

gunzip -c log-gr_base.log.2018-05-08.gz | awk '/Failed to calculate US Tax/ && !A[p]++{print q OFS p} {split ($0,F,/[][]/); p=$5; q=F[2]}'