I want my script to NOT to send an e-mail if it finds the same keyword more than twice.

My script triggers and e-mail if keywords supplied to it were found.
Problem is if it find the same keyword continously (due to continous server errors), it triggers mails and fillup my mail box with same message (which is not required)
I want my script to NOT to send an e-mail if it finds the same keyword more than twice.

Following is my script

#!/bin/sh
TOTAL_LINES=0
DELTA=0
ERR=0
LINE_NO=`cat /sybase/B10/temp.txt`
echo $LINE_NO
TOTAL_LINES=`more /sybase/B10/ASE-15_0/install/B10.log | wc -l`
echo $TOTAL_LINES
DELTA=`expr $TOTAL_LINES - $LINE_NO`
echo $DELTA
ERR=`tail -$DELTA /sybase/B10/ASE-15_0/install/B10.log | egrep -i "infected|signal|error|warning|severity|fail|suspect|corrupt|deadlock|critical|problem|threshold|NO_LOG|logsegment|stacktrace|encountered"| grep -v "background task error"|wc -l`
echo $ERR
if [ "ERR" -gt 0 ]; then
 tail -$DELTA  /sybase/B10/ASE-15_0/install/B10.log  | egrep -i "infected|signal|error|severity|fail|suspect|corrupt|deadlock|critical|problem|threshold|NO_LOG|logsegment|stacktrace"| grep -v "background task error" | mailx -s "Errors found in B10.log. Pls check" rajeshneemkar@gmail.com
rm /sybase/B10/temp.txt
echo "$TOTAL_LINES" >/sybase/B10/temp.txt"
fi

Any help in this regards is highly appreciated
Best Regards,
Rajesh

Just to clarify the purpose of the script - this script is run periodically (via cron?), and when it runs, it looks for keywords in the "new" part of the logfile (what was appended since the last time you ran the file), and finding them, is supposed to send you one email?

---------- Post updated at 08:54 AM ---------- Previous update was at 08:52 AM ----------

And by "finds the keyword twice", do you mean you've gotten more than two emails for the same keyword? Or for any of the keywords? What would clear the condition - not finding the error when the script is run?

Yes, the script is run via CRON (every minute)
Yes, it looks for keyword from the "new" part of log file
Yes, I received more than 500 same e-mail, when the same error occured more than 500 times
No of e-mails is proportional to no of (same)error messages, whihc i dont want. (I want only one e-mail if the error is recurring)

Thanks
Rajesh

---------- Post updated at 09:41 AM ---------- Previous update was at 09:23 AM ----------

Consider the following situation

JS: js__callout: Not yet time for sjobid 137, calloutid 471531
JS: js__callout: Not yet time for sjobid 137, calloutid 471531
JS: js__callout: Not yet time for sjobid 137, calloutid 471531
JS: js__callout: Not yet time for sjobid 137, calloutid 471531

I receive an 4 e-mails if the same error is repeating.
Now imagine the situation with 500 errors (same and continous)

What I'm looking at is, if the script detects the same error as detected earlier (due to continous errors generated by the server), it should not send me an e-mail for the second time. Just one e-mail is enough.

This is just to stop inbox full situation

The thread is not AIX specific (moving it) and please continue to use code tags, thanks.

I may be misunderstanding your requirements but wouldn't:-

  1. create a log of emails sent
  2. match lines your "new" part of the source log to to the email log
  3. on no hit send an email and append the emails sent log else get the next line

be simpler?

I guess I still don't understand your requirements. You show four lines above and say you only want one message for those four lines (not four messages), but none of the keywords in either of the egrep commands in your script are in those four lines. So, why should any message be sent?

Let me see if I understand what you're trying to do. You're running your log processor every minute, and you're only looking at new messages in the log file each minute. So, are you saying you want a mail message each minute of the day in which that error is reported? With the sixteen keywords in your 1st egrep (there are only fourteen in the 2nd egrep ) do I correctly understand that you're OK getting 16 messages each minute for each minute of the day (for a maximum of 23,040 messages in your mailbox each day) as long as no two messages sent by any invocation of your log checker contain the same keyword? Are the lines that you show above supposed to generate mail messages or not? (As I said before, none of your keywords are present in those lines.)

Why are you counting sixteen keywords as errors but only sending e-mail for fourteen of those keywords?

Are the lines really that completely identical? You can just remove duplicates:

awk '! ($0 in A) { A[$0] ; print }' inputfile

I guess I still don't understand your requirements. You show four lines above and say you only want one message for those four lines (not four messages),

Pls check the attachment for the detailed error messages
Exact situation: There was a problem and the server is reporting it every second and it was friday, my weekend.
So after checking my mails on monday i., 2 days later, I received
60X60X60=216000 !! e-mails for the SAME message.
I do not want same email these many times. If it is different messages, it is absolutely fine.

but none of the keywords in either of the egrep commands in your script are in those four lines. So, why should any message be sent?

This is an example only not the exact situation. You can consider "JS_callout as a keyword here in this example"

Let me see if I understand what you're trying to do. You're running your log processor every minute, and you're only looking at new messages in the log file each minute. So, are you saying you want a mail message each minute of the day in which that error is reported?

Yes, my script runs every minute and checks the NEW lines in the database server errorlog for the errors (keywords) and send send an e-mail to my outlook account IF ANY.

With the sixteen keywords in your 1st egrep (there are only fourteen in the 2nd egrep ) do I correctly understand that you're OK getting 16 messages each minute for each minute of the day (for a maximum of 23,040 messages in your mailbox each day) as long as no two messages sent by any invocation of your log checker contain the same keyword? Are the lines that you show above supposed to generate mail messages or not?

NO, I do not want emails for recurring messages within small duration
I'm ok if it sends the same error message (if received) for every hour

Why are you counting sixteen keywords as errors but only sending e-mail for fourteen of those keywords?
[/quote]
You can ignore the keywords part here. It is working fine and as per requirement

I guess your process has a faulty design from the beginning.

Running the log check every minute makes sense only if you react within the minute. Estimate your reaction time and reduce the job's frequency adequately.

Composing the grep ... command pipe will trigger one single mail if any of the keywords is found any no. of times in the NEW part of the logfile, which requires the new lines to be identified correctly. You may want to reconsider that.

Creating a sensible schedule and setup for the log check, you should receive, say, 48 mails for the weekend, unless you reduce the frequency even further for out-of-office hours.

Rudic,

I could not understand your statement "I guess your process has a faulty design from the beginning."

Running the log check every minute makes sense only if you react within the minute. Estimate your reaction time and reduce the job's frequency adequately.

This is meaningful and i will consider to change the frequency of the CRON job

Composing the grep ... command pipe will trigger one single mail if any of the keywords is found any no. of times in the NEW part of the logfile, which requires the new lines to be identified correctly. You may want to reconsider that.

I'm already aware of this.

Creating a sensible schedule and setup for the log check, you should receive, say, 48 mails for the weekend, unless you reduce the frequency even further for out-of-office hours.

You mean running the job for every hour? I will think on this.

Can you answer the following situation:

Imagine the CRON job is scheduled for every 10 minutes

The following are the new entries in the errorlog
JS: failed to receive jsagent response
Job Scheduler Task lost its Agent connection (Error 0)
Job Scheduler Task connected with Agent on port 4903

I will receive only 1 email for this. Agreed and accepeted.

Now, at next run(20 min later) I received the new and same error messages int he errorlog
JS: failed to receive jsagent response
Job Scheduler Task lost its Agent connection (Error 0)
Job Scheduler Task connected with Agent on port 4903

And again in the next run, same scenario and soon

Can you let me know how can we stop the script sending e-mails for the second run and thereafter (for same error messges scenario).

This is what exactly im trying to find out.

Hope you understand my question

-
Rajesh

How about doing a sort | uniq -c on your new grep ped results and the old extract (saved somewhere), and if the count is = 1, mail the error?
That would suppress identical lines; but a deviation of the slightest iota would issue a mail.

Proposed solution:

#! /bin/sh

HOST=`uname -n`
MAILTO=rajeshneemkar@gmail.com

LOGFILE=/sybase/B10/ASE-15_0/install/B10.log
LOGBASE=`basename ${LOGFILE} .log`
LOGSENT=${LOGBASE}.notified
LOGLAST=${LOGBASE}.lastline

LOOKFOR="corrupt"
LOOKFOR="${LOOKFOR}|critical"
LOOKFOR="${LOOKFOR}|deadlock"
LOOKFOR="${LOOKFOR}|error"
LOOKFOR="${LOOKFOR}|fail"
LOOKFOR="${LOOKFOR}|infected"
LOOKFOR="${LOOKFOR}|logsegment"
LOOKFOR="${LOOKFOR}|no_log"
LOOKFOR="${LOOKFOR}|problem"
LOOKFOR="${LOOKFOR}|severity"
LOOKFOR="${LOOKFOR}|signal"
LOOKFOR="${LOOKFOR}|stacktrace"
LOOKFOR="${LOOKFOR}|suspect"
LOOKFOR="${LOOKFOR}|threshold"

IGNORED="background task error"

#-------------------------------------------------------------------------------
# initialize notification and lastline files

touch ${LOGSENT}

if [ ! -f ${LOGLAST} ]; then # start at the beginning
  echo 0 > ${LOGLAST}
fi

LASTPREV=`cat ${LOGLAST}`
LASTLINE=`wc -l < ${LOGFILE}`
echo ${LASTLINE} > ${LOGLAST}

if [ ${LASTLINE} -lt ${LASTPREV} ]; then # file has been trimmed
  LASTPREV=0
fi

#-------------------------------------------------------------------------------
# extract lines of interest

TMPFILE=`mktemp`
NEWFILE=`mktemp`

trap "rm -f ${TMPFILE} ${NEWFILE}" EXIT

if [ -f ${LOGFILE} ]; then
  tail -n +$(( ${LASTPREV} + 1 )) ${LOGFILE} \
  | egrep -i  "${LOOKFOR}" \
  | egrep -iv "${IGNORED}" \
  | sort -u
else
  echo ${HOST}:${LOGFILE}: no such file or directory
fi > ${TMPFILE}

comm -23 ${TMPFILE} ${LOGSENT} > ${NEWFILE} # new error messages

if [ -s ${NEWFILE} ]; then # send out new messages
  sort -o ${LOGSENT} ${LOGSENT} ${NEWFILE} # update message cache
  mailx -s "Errors - ${LOGFILE##*/} on ${HOST}" < ${NEWFILE}
fi

Some details:

LOGFILE=/sybase/B10/ASE-15_0/install/B10.log
LOGBASE=`basename ${LOGFILE} .log`
LOGSENT=${LOGBASE}.notified
LOGLAST=${LOGBASE}.lastline

The notified file keeps track of error messages that have been detected. This file needs to be removed once errors have been cleared.
Remove the lastline file if you want the rescan the entire logfile.

LOOKFOR="corrupt"
LOOKFOR="${LOOKFOR}|critical"
    ....snip....
IGNORED="background task error"

Just to make it easier to keep track of what is looked for and what is ignored.

tail -n +$(( ${LASTPREV} + 1 )) ${LOGFILE} \
  | egrep -i  "${LOOKFOR}" \
  | egrep -iv "${IGNORED}" \
  | sort -u

Get the unique error messages that have logged from the line after the previous last line of the logfile.

comm -23 ${TMPFILE} ${LOGSENT} > ${NEWFILE}

List of new error messages

LOOKFOR="\
corrupt
critical
deadlock
error
fail
infected
logsegment
no_log
problem
severity
signal
stacktrace
suspect
threshold"

and a grep should work as well.

MiG - good alternative. But I was not sure which version of *IX being used.

Had I thought about this longer, I would use two files:

if [ -f ${LOGFILE} ]; then
 tail -n +$(( ${LASTPREV} + 1 )) ${LOGFILE} \
 | grep -Eif "${LOOKFORS}" \
 | grep -Eivf "${IGNORES}" \
 | sort -u
else

${LOOKFORS} is a file that would list what is being looked for:

\<corrupt\>
\<critical\>
  ... snip ...
\<suspect\>
\<threshold\>

and ${IGNORES} would contain what can be ignored:

background task error

Command line options should also be used to select email address, lookfors file, ignores files, etc., should also be used (imho). Something like:

logwatch [-e emiladdr] [-l lookfors] [-i ignores] [-n notified] [-l lastline] logfile

but this will be left as an exercise for the reader.

---------- Post updated at 07:41 AM ---------- Previous update was at 05:36 AM ----------

Rajeshneemkar,

You should also consider having a second cronjob that runs daily or weekly that removes the ${LOGSENT} file. This would ensure that you get notified at least once daily of a reoccurring problem.

  • DL

---------- Post updated at 07:55 AM ---------- Previous update was at 07:41 AM ----------

I also missed the attached logfile, so please change:

tail -n +$(( ${LASTPREV} + 1 )) ${LOGFILE} \
| egrep -i  "${LOOKFOR}" \

to:

tail -n +$(( ${LASTPREV} + 1 )) ${LOGFILE} \
| sed -e 's/^[^>]*> //' \
| egrep -i  "${LOOKFOR}" \

which will strip off the timestamps.