Notify as soon as an error is encountered in a script

Hi,

The script below works okay and emails me the log in the end once the script completes but what I'm trying to do is to also notify me via an email as soon as the script encounters any error whatsoever.

cat test.list
hdisk0         00a6351a2c832da1                    rootvg          active
hdisk1         00a6351a2c832f66                    rootvg          active
hdisk2         00a6351a2c833311                    optvg           active
hdisk3         00a6351a2c8334a5                    optvg           active
hdisk4         00a6351a2cbf3049                    optvg           active
##########################################
cat NEWDISK.list
rootvg hdisk6 hdisk0
rootvg hdisk7 hdisk1
optvg  hdisk8 hdisk2
optvg  hdisk9 hdisk3
optvg  hdisk10 hdisk4 
cat replacepv.sh
#!/bin/ksh
cat test.list | while read DISK1 PVID VG
 do
  if grep $DISK1 NEWDISK.list | read  VG DISK2
   then
        echo "Replace DISK $DISK1 to $DISK2........" | tee -a LOG.OUT
        sudo  replacepv $DISK1 $DISK2  | tee -a ERR.OUT
     fi
 done
echo "" | tee -a LOG.OUT
echo "" | tee -a ERR.OUT

mail -s "Disk replaced" user@abc.com< LOG.OUT

Thanks,
mbak

Please be more explicit about what errors you want to cause to send an additional email.

Is it an error that you are invoking replacepv with three operands while the man page only specifies what happen when two operands are given? For example, when you read the 1st line from test.list (setting DISK1 to hdisk0 ), you will be running the command:

        sudo  replacepv hdisk0 hdisk6 hdisk0  | tee -a ERR.OUT

Is that an error? Does the above replacepv command complete successfully? Does it print any diagnostic messages? Does it write anything into ERR.OUT ?

Is it an error when your grep command matches two or more lines and you only process one of those lines? For example, when you read the 2nd line from test.list , the command:

        grep hdisk1 NEWDISK.list

will match the following lines from NEWDISK.list :

rootvg hdisk7 hdisk1
optvg  hdisk10 hdisk4 

but you only read the 1st matched line and run the command:

        sudo  replacepv hdisk1 hdisk7 hdisk1

ignoring the 2nd line and not running the command:

        sudo  replacepv hdisk1 hdisk10 hdisk4

Is that an error? Or, is it an error that the above grep command matched hdisk10 on the 2nd line when it should not match hdisk10 when I assume you only wanted exact matches for the word hdisk1 ?

Or, did you just want to mail any lines appended to ERR.OUT (even though ERR.OUT does not contain any diagnostic messages that might have been written to explain what errors had been detected)?

My apologies for missing out a value in the script, it should have been as below,

if grep -w $DISK1 NEWDISK.list | read  VG DISK2 DISK3

Since I'm replacing multiple disks, I want to get an email or I can set it up to send an alert whenever the "replacepv" command fails for any reason like disk not found or something before trying to run "replacepv" on next disk in sequence in other words notify me as soon as it fails at any point during the script execution. I noticed that ERR.OUT doesn't log anything if I use a disk that doesn't exist which throws an error on screen.

Please be more specific about what you're asking.

What system hardware are we talking about?
What RAID controller?

Are you asking....
How to interrogate the RAID controller?
How to trap the error in the script?

When you say to mail you as soon as it fails are you saying that you want to receive an email that very second? Email systems (daemons) don't work that fast. The mail relays don't work that fast depending if your inbox isn't on the same system. It could take a few minutes.

---------- Post updated at 09:38 AM ---------- Previous update was at 09:36 AM ----------

Oh, and which operating system is it?

OS=AIX
Basically, I'm just trying to trap the error in the script and log it to ERR.OUT before mailing out ERR.OUT via mail command.

So can't you test the exit status of the command to see if it's non-zero and, if so, email the log to yourself there and then?

I'm assuming that the RAID command obeys usual practice and gives a non-zero exit status if it errors.

1 Like

This may work, assuming I understand your requirements correctly.

Add the following lines to the start of your program:

err_code() {
mail -s "Problem with disk replacement" user@abc.com< ERR.OUT
}
trap 'err_code' ERR
set -e
set -o pipefail

What should now happen is that if the disk replacement program fails, an ERROR signal is sent to the process running your script, and it will run the function err_code , then exit.

The command set -e tells the shell to exit with an error if a command fails (you may want to put your loop, or just the sudo line, between a set -e set +e pair of commands so that other commands don't execute the err_code function). This isn't enough, however, as the tee command will exit with no error. So set -o pipefail command will cause the pipeline to fail if the sudo fails.

Hope that helps.

Andrew

1 Like

Thanks to hicksd8 and Andrew for giving me couple of options to work with, much appreciated.