awk to find lines containing word that occur multiple times

i have a script that scans a log file every 10 minutes. this script remembers the last line of the log and then uses it to continue monitoring the log when it runs again 10 minutes later.

the script searches the log for a string called MaxClients.

now, how can i make it so that when the script runs, it'll let me know when it finds not just 1 but 3 lines containing MaxClients?

awk 'FNR=>'5034' logfile | egrep MaxClients

sounds simple enough. but there's a catch. lets say the log file can contains:

Warning MaxClient is having issues - please
MaxClients java error...yams potato turkey
MaxClients error Fatal exception known known
MaxClients error Fatal exception known known
Could not complete. Error found. MaxClient initiated
MaxClients error Fatal exception known known
Could not complete. Error found. MaxClient initiated

in the above example, the script should only output the lines:

MaxClients error Fatal exception known known
MaxClients error Fatal exception known known
MaxClients error Fatal exception known known

This is because, this is the line that occurred at least 3 times when the script scanned the log at its 10 minute interval.

grep "^MaxClients error Fatal exception known known" file

This should do! :slight_smile:

1 Like

A pointer (may be it's 90% of the solution!):

awk '/MaxClients/{a[$0]++}END{for(i in a) if(a>=3) for(j=1;j<=a;j++) print i}' file
1 Like

First of all no need to pipe awk output to egrep . You can save that pipeline because awk is capable of doing what egrep can.

Will this code work for you?

awk 'NR>=5034&&/^MaxClients error/{++c}c==3{o=$0 RS $0 RS $0; print o; exit 1}' logfile 
1 Like

this looks like it'll work, but what happens if there are more than 3 occurrences? i think this code will only show the 3 lines. however, what happens if there are more than 3 lines?

i should have been clearer in my post. sorry for that.

If you want to print each time the number of occurrences reaches 3, then remove the exit 1 statement and reset counter c=0

awk 'NR>=5034&&/^MaxClients error/{++c}c==3{o=$0 RS $0 RS $0; print o; c=0}' logfile

Now the code will print every times 3 occurrences are found. If this is not you want, modify as per your requirement. I hope this helps.

1 Like

Thank you. one last question. is the below the best way to include search strings i want to exclude?

awk 'NR>=1&&/MaxClients/ && !/java|could not.*problem found|panic() failure seen|aborting [ERROR]/ {++c}c==3{o=$0 RS $0 RS $0; print o; c=0}' logfile 

in the above, i'm basically saying i want to ignore all lines that contain MaxClients that also contain any of these

"java|could not.*problem found|panic() failure seen|aborting [ERROR]".

Why don't you simply look for lines starting with /^MaxClients error/ ?

But if that will not work, then yes you can use:

/MaxClients/&& !/java/ && !/failure/ etc..
1 Like