Search and filter by TAG

Lord_Spectre · January 27, 2011, 4:17pm

Hello all,
searching on a text file (log file) is quite simple:

grep -i texttosearch filename | grep something

What I'm trying to do is filter the result by TAG and remove the double entries.

Like if the log file contains the following text (fields are separated by commas):

20101024_201000:Counter,RESPONSE_FAIL,NODE,ApplicationAccessGroup.ServerGroup.Server.41700,1,47,0;
20101024_201000:Counter,RESPONSE_OK,NODE,ApplicationAccessGroup.Server.41880,1,15,0;
20101024_201000:Counter,RESPONSE_FAIL,TOTAL,Total,25459;
20101024_201000:Counter,RESPONSE_FAIL,TOTAL,Total,1;
20101025_215000:Counter,RESPONSE_FAIL,TOTAL,Total,15459;

Now, time to filtering

20101024_201000:Counter,RESPONSE_OK,NODE,ApplicationAccessGroup.Server.41880,1,15,0;
20101025_215000:Counter,RESPONSE_FAIL,TOTAL,Total,15459;

So group by TAG (the bold/red one) and show only the last one.....

Is it possible to do in a simple way? Or it's so hard to do?

Hope my goal is clear!

Chubler_XL · January 27, 2011, 4:56pm

$ awk -F, '{A[$2]=$0} END{for(i in A) print A}' infile
20101025_215000:Counter,RESPONSE_FAIL,TOTAL,Total,15459;
20101024_201000:Counter,RESPONSE_OK,NODE,ApplicationAccessGroup.Server.41880,1,15,0;

Lord_Spectre · January 27, 2011, 5:22pm

omg, that's awesome! All in one row!
Thanks Chubler, you are a Master, and I'm so newbie!

And, what about if I want to apply the same filter using one string like "ApplicationAccessGroup"? (so no based on positioning but filtering and group by text?

Chubler_XL · January 27, 2011, 5:26pm

I'm so 100% sure what you want here. I assume you want to filter by a text string, but still group by field 1?

$ awk -F, '/ApplicationAccessGroup/ {A[$2]=$0} END{for(i in A) print A}' infile
20101024_201000:Counter,RESPONSE_OK,NODE,ApplicationAccessGroup.Server.41880,1,15,0;

Lord_Spectre · January 27, 2011, 5:35pm

You're right! I was not clear!
I would like to filter and group ONLY by a text string (ie "ApplicationAccessGroup") and not by string positioning:

So, based on the same log from my previous post, the output should be:

20101024_201000:Counter,RESPONSE_OK,NODE,ApplicationAccessGroup.Server.41880,1,15,0;

What I don't understand is why you put:

{A[$2]=$0}

Does it mean it refer always to the second field (RESPONSE_OK)?

Chubler_XL · January 27, 2011, 5:59pm

So if I'm understanding this requirement correctly it could be written as:
The last record that contains "ApplicationAccessGroup"

grep "ApplicationAccessGroup" infile | tail -1

Lord_Spectre · January 28, 2011, 4:07am

Yep! It was too simple!

May I ask you the last combination? Last question!!!!! Sorry, this time is very hard!
What about if I want to filter by TAG but I have this log:

INFO 2011.01.27 20:58:11.00  com.log.SocketQueryListener$ServerListener createSocketConnection
  PROT_SOCKET_OPEN   Accepted socket connection for port 5018

INFO 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener

INFO 2011.01.27 20:59:40.285  com.nbprotocol.connection.ServerSocketConnection onOpen
  PROT_SOCKET_OPEN   Open socket ServerSocketConnection : /192.168.1.27:56420//192.168.1.29:5000

INFO 2011.01.27 20:59:40.489  com.access.protocol.handler.ServerHandler 1234.1291717116.transitionState
  PROT_SOCKET_OPEN   CP connection 1234.1291717116 has changed state to Init

INFO 2011.01.27 20:59:40.489  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connecting [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

INFO 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

Well, I would like to filter and group like my first post, but this time I have no commas as reference and each log records is on two lines!
So the result should be:

INFO 2011.01.27 20:59:40.489  com.access.protocol.handler.ServerHandler 1234.1291717116.transitionState
  PROT_SOCKET_OPEN   CP connection 1234.1291717116 has changed state to Init

INFO 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener

INFO 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

rdcwayx · January 30, 2011, 6:03am

awk '{A[$6]=$0} END{for(i in A) print A}' RS="" infile

Lord_Spectre · January 30, 2011, 11:05am

Thank you, you are my hero :D, it works very well...
Now I need to study that code line since I need to filter the output by severity. So for example I need to remove the INFO and WARN severity from the output search.

So, starting with:

INFO 2011.01.27 20:59:40.489  com.access.protocol.handler.ServerHandler 1234.1291717116.transitionState
  PROT_SOCKET_OPEN   CP connection 1234.1291717116 has changed state to Init

WARN 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener

FATAL 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener

ERROR 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

The output should be:

FATAL 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener

ERROR 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

Maybe I can play with print statment inserting a simple | grep -v!

---------- Post updated at 11:05 AM ---------- Previous update was at 10:46 AM ----------

Well i solved using a temporary file. So, first I filter the severity and then I'll filter the tag!

cat /home/user/log/Event*.log" | sed -e "/./{H;$!d;}" -e "x;/ERROR/!d;" |  sed -e "/./{H;$!d;}" -e "x;/$1/!d;" >> output.tmp
cat /home/user/log/Event*.log" | sed -e "/./{H;$!d;}" -e "x;/FATAL/!d;" |  sed -e "/./{H;$!d;}" -e "x;/$1/!d;" >> output.tmp

vgersh99 · January 30, 2011, 11:08am

awk '$1!="INFO" && $1!="WARN"{A[$6]=$0} END{for(i in A) print A}' RS="" infile

Lord_Spectre · January 30, 2011, 11:15am

Ah! Ok, as always, all in one row is better!
is there a way to count how many TAG occours? I mean this is filtering by "PROT_SOCKET_OPEN", "SOCKET_CLOSE_OK", and so on....

I would like to count how many time for example the PROT_SOCKET_OPEN event occour and if possible print near the output like:

ERROR 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  (120) PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

vgersh99 · January 30, 2011, 11:55am

nawk '$1!="INFO" && $1!="WARN"{A[$6]=$0;c[$6]++} END{for(i in A) print A," changed [" c "]"}' RS="" myFile

Lord_Spectre · January 30, 2011, 12:06pm

It append the "counter" at the end of the single event not in the middle as my example, but it's perfect also as is.....
Also I don't have nawk, but seem it works also with awk!

Many many thanks to all the people who help me!
Now it's time to study and make experiments with all the suggested commands! :rolleyes:

vgersh99 · January 30, 2011, 12:13pm

nawk '$1!="INFO" && $1!="WARN"{A[$6]=$0;c[$6]++} END{for(i in A) {n=index(A,ORS);print substr(A,1,n) "(" c ")" substr(A,n+1)}}' RS="" myFile

Lord_Spectre · January 30, 2011, 12:21pm

hehehehe thank you again vgersh!
Do you like Star Trek? Your nick seem reveals something about you vgersh

vgersh99 · January 30, 2011, 12:51pm

I believe it was 'V...ger'

Lord_Spectre · January 30, 2011, 12:55pm

Yep! But Enterprise equipage called it: vger (viger)!

grepeverything · January 30, 2011, 2:25pm

This doesn't do exactly what you want but comes close in a simple way. I just mention it as an f.y.i. as to another approach:

grep -A 1 -E '^FATAL|ERROR ' inputfile  | tail -6

Gives output like:

--
FATAL 2011.01.27 20:58:15.242  com.log.SocketConnection doRun
  SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener
--
ERROR 2011.01.27 20:59:40.690  java.lang.String EventDispatcher.logEvent
  PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}

Doesn't give the counts like you want. You could get some statistics like this though:

grep -A 1 -E '^FATAL|ERROR ' inputfile | grep -Eo  '([[:upper:]]+_){1,}[[:upper:]]+'| sort |uniq -c

Outputs:

Lord_Spectre · January 31, 2011, 4:15am

Interesting approach, expecially for the statistic part. I'll use in my script and probably I'll adapt to my needs!

Thanks for feedback!

---------- Post updated 31-01-11 at 04:15 AM ---------- Previous update was 30-01-11 at 02:37 PM ----------

grepeverything:

Doesn't give the counts like you want. You could get some statistics like this though:
grep -A 1 -E '^FATAL|ERROR ' inputfile | grep -Eo  '([[:upper:]]+_){1,}[[:upper:]]+'| sort |uniq -c
Outputs:
45 PROT_STATE_CHANGE
45 SOCKET_CLOSE_OK

Well, is it possible to have the severity in front of the statistic number and then the rest of the log after the tag?
Something like:

FATAL [45] SOCKET_CLOSE_OK   Closed socket   node SocketQueryListener
ERROR [45] PROT_STATE_CHANGE   Protocol handler  : connectConfirmState [NetworkAccessGroup.ClientGroup.Client.1234] changed {4}