No alarm triggered when backup tape is full

Hi folks, i encountered an issue, no alarm is triggered when the tape is full on the Solaris server.
The tape is 72GB.
There is an alarm generating shell script in the server, which will trigger an alarm when it detects the tape is full.
Hence, I will receive SMS and email alerts.
This has been working for quite sometimes, however the issue happens only a few months ago.
I have checked that no one has modified the shell script.
The alerting system which sends out SMS and email alerts are working fine for other alarms, except for the tape full alarm.
Any other things which i can check on the server in order to troubleshoot this issue?
Could it be some detections have failed somewhere?

how the script is executing ? through cron ?
if yes, did u check the cron logs ?

Are you saying that this used to work? Or have your backups just got bigger over time and now tape is filling up?

What backup utility is being used? tar, cpio, ufsdump, or what?

Some utilities require you to tell it how big the tape is on the command line.

Yes, the script is executing thru cron as follows:
# Backup /data directories into tape device
#==========================================
0 5 * * * /export/home/provadm/script/tape_backup.sh >/dev/null 2>&1
#

I was unable to find any cron logs from /var/log.
Can you advise me which directory can I look for cron logs?
Thanks :slight_smile:

---------- Post updated at 12:15 PM ---------- Previous update was at 12:05 PM ----------

Yes, this used to be working fine a year ago.
I've tested filling up a tape with some dummy files which i created in the directory, till it reaches the capacity of the tape(72GB). However, there is no alarm triggered.

I use tar as the backup utility.

---------- Post updated at 12:43 PM ---------- Previous update was at 12:15 PM ----------

Below is the shell script which was created by the previous engineer:

#!/bin/sh
#set -x

VERSION="Version: 1.06"
######################
# Choose the HOME dir
#####################
#HOME=/export/home/oracle
HOME=/export/home/provadm
BINDIR=$HOME/bin
LOGDIR=$HOME/log
SCRDIR=$HOME/script
CONFDIR=$HOME/config
TEMP=$HOME/temp
#CONFIG_FILE="$CONFDIR/logInfo.cfg"
#######################
# Choose the BACKUP dir
#######################
#LSMS dir is /data1
BACKUPDIR=/data
#PGW,RPT,LNPDB dir is /data
#BACKUPDIR=/data
#BACKUPDIR2=/data/data1
#LOGS=`cat $CONFIG_FILE | grep -v ";" | /bin/awk '{print $2}'`
#LOGS=`echo $LOGS`
newhost=`hostname |sed 's/LIV_//'| sed 's/_01//'`
TAPE_DEVICE=/dev/rmt/0
TAPE_DEVICE_APPEND=/dev/rmt/0n
datestr=`date +%d-%m-%Y`
timestr=`date +%H:%M:%S`
event_day=`/bin/date +%Y%m%d`
currdate=`$HOME/bin/lastdate +1 | awk '{printf "%s",substr($0,1,6)}'`
LOG=$LOGDIR/TAPE/$currdate/tape_bk_$event_day.log
TMP_LOG=$TEMP/tape_backup.tmp
# Format yyyymmdd
yest01=`$BINDIR/lastdate 1 | /bin/awk '{print $1}'`
# Format MMM
mm=`$BINDIR/lastdate 1 | /bin/awk '{print $2}' | cut -c1-3`
# Format DD
dd=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c7-8`
# Format YYYY
yyyy=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c1-4`
# Format yyyymm
yestmonth01=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c1-6`
# ==============
# Date and Month
# ==============
REP_MONTH=`date "+%Y%m"`
REP_DATE=`date "+%Y%m%d"`
FIRST_DAY=`$BINDIR/lastdate | /bin/awk '{print $1}' | cut -c7-8`
chktape()
{
status=`/bin/mt -f $TAPE_DEVICE_APPEND status | grep 'sense' | awk '{print substr($1,1,5)}'`
}
tarcmd()
{
# Check the $TMP_LOG file if exists, if yes remove the file
if [ -f $TMP_LOG ]; then
rm $TMP_LOG
fi
#/data1/ACCESS
#All Servers
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/ACCESS/$yestmonth01/*$yest01*.tar.Z >> $LOG 2>>$TMP_LOG

    if [ $FIRST_DAY = "01" ]; then
            \#/data1/BACKUP
            \# All Servers 
            /bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/BACKUP/$yestmonth01/\*_$yestmonth01.* >> $LOG 2>>$TMP_LOG
    fi
    
    \#/data/BACKUP
    \# LNPDB, RPT
    \#/bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/BACKUP/$yestmonth01/\*$yest01*.tar.gz >> $LOG 2>>$TMP_LOG
    
    \#/data1/SYS
    \# All Servers
    /bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/SYS/$yestmonth01/$yest01*.Z  >> $LOG 2>>$TMP_LOG
    \#/data1/TDR1
    \# LSMS, RPT
    \#/bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/TDR1/$yestmonth01/$yest01*.Z >> $LOG 2>>$TMP_LOG
    \#/data1/TDR1_M
    \# LSMS
    \#/bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/TDR1\_M/$yestmonth01/$yest01*.TDR1_M.Z >> $LOG 2>>$TMP_LOG
    \#/data/TDR2
    \# PGW, RPT
    /bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/TDR2/$yestmonth01/$yest01\*.*.TDR2.Z >> $LOG 2>>$TMP_LOG
    \# /data/TDR2_M
    \# PGW
    /bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/TDR2\_M/$yestmonth01/$yest01*.TDR2_M.Z >> $LOG 2>>$TMP_LOG
    
    \# /data/IPMP
    \# LSMS, PGW, LNPDB, RPT
    /bin/tar cvf $TAPE\_DEVICE_APPEND $BACKUPDIR/IPMP/$yestmonth01/IPMP.log.$yest01.*.Z >> $LOG 2>>$TMP_LOG

}
# Main
status=""
chktape
if [ "$status" = "sense" ]
then
# Got tape inside
# Clear alarm
$SCRDIR/gen_alarm.sh 21 0 "$newhost No Tape detected"
# Start doing backup
echo "" | /bin/tee -a $LOG
echo "Tape Backup Start ($datestr $timestr) ..." | /bin/tee -a $LOG
echo "" | /bin/tee -a $LOG
#echo "Logs folder: [$LOGS]"
#for logfolder in $LOGS
#do
tarcmd >> $LOG
#done
if [ $? -ne 0 ]
then
# Put the stderr message generated from tarcmd() into the $LOG file
cat $TMP_LOG >> $LOG
echo "" | /bin/tee -a $LOG
echo "Tape Backup was failed" >> $LOG
#cd $BINDIR
$SCRDIR/gen_alarm.sh 23 1 "$newhost Tape backup error"
else
# Put the stderr message generated from tarcmd() into the $LOG file
cat $TMP_LOG >> $LOG
echo "" | /bin/tee -a $LOG
echo "Tape Backup Succeeded" >> $LOG
#cd $BINDIR
$SCRDIR/gen_alarm.sh 23 0 "$newhost Tape backup error"
fi
EOT=`cat $TMP_LOG | grep 'tar: write error: unexpected EOF' | wc -l`
if [ $EOT -ge 1 ]; then
# Tape Full
# Raise alarm
$SCRDIR/gen_alarm.sh 22 1 "$newhost Tape full"
echo "" | /bin/tee -a $LOG
echo "Tape Full ($datestr $timestr) ..." | /bin/tee -a $LOG
echo "" | /bin/tee -a $LOG
else
# Tape NOT Full
# Clear alarm
$SCRDIR/gen_alarm.sh 22 0 "$newhost Tape full"
fi
echo "" | /bin/tee -a $LOG
datestr=`date +%d-%m-%Y`
timestr=`date +%H:%M:%S`
echo "Tape Backup End ($datestr $timestr) ..." | /bin/tee -a $LOG
else
# No tape inside
# Raise alarm
$SCRDIR/gen_alarm.sh 21 1 "$newhost No Tape detected"
echo "" | /bin/tee -a $LOG
echo "Tape Backup Fail ($datestr $timestr) ..., unable to find tape" | /bin/tee -a $LOG
echo "" | /bin/tee -a $LOG
fi
# Housekeep the tape_bk_yyyymmdd.log
hk_month=`$BINDIR/lastdate 121 | /bin/awk '{print $1}' | /bin/cut -c1-6`
\rm $LOGDIR/TAPE/$currdate/tape_bk_$hk_month*.log
find $TEMP -name 'tape_backup.tmp' -exec rm {} \;

The script appears to append up to 6 "tar" archives each run to a tape. All commands use the "no rewind" tape device, so if the tapes are cycled one would expect the tape to eventually become full. Maybe you recycle tapes with another process?

If you always use new backup tapes straight out of the wrapper, please mention this. Also please mention how big you think the total of the 6 "tar" archives is.

I'm not so sure that just testing of "status" (with "mt") and then looking for a single error message from "tar" is sufficient to determine if the tape is already full or where "tar" has failed.

For a problem tape, try running the command:

mt -f /dev/rmt/0n status

and then trying to see whether there is any clue that the tape is already full.

Also, the messages from "tar" (if it ran at all) are probably in the main backup log pointed to by $LOG.

Note: The cron command line redirection to /dev/null hides any output which is not trapped by the script. This would include any script syntax errors or untrapped output from commands.

The whole process would need very good record keeping if you wanted to restore a particular file from a particular named archive in a particular tape partition.

If the intention is only to ever have one backup set on one tape, then the script may need to start by rewinding the tape. Depends on what your backup strategy is and whether you change the tapes on a wide backup cycle like most sites.

I couldn't reformat the original post of the script because it contains too much extraneous HTML.

Here is the original script minus any surplus commented-out lines and indented for readability. This is not a corrected script, just a diagnostic so we can read it!

#!/bin/sh
#set -x

VERSION="Version: 1.06"
######################
# Choose the HOME dir
######################
HOME=/export/home/provadm
BINDIR=$HOME/bin
LOGDIR=$HOME/log
SCRDIR=$HOME/script
CONFDIR=$HOME/config
TEMP=$HOME/temp
#######################
# Choose the BACKUP dir
#######################
BACKUPDIR=/data
newhost=`hostname |sed 's/LIV_//'| sed 's/_01//'`
TAPE_DEVICE=/dev/rmt/0
TAPE_DEVICE_APPEND=/dev/rmt/0n
datestr=`date +%d-%m-%Y`
timestr=`date +%H:%M:%S`
event_day=`/bin/date +%Y%m%d`
currdate=`$HOME/bin/lastdate +1 | awk '{printf "%s",substr($0,1,6)}'`
LOG=$LOGDIR/TAPE/$currdate/tape_bk_$event_day.log
TMP_LOG=$TEMP/tape_backup.tmp
# Format yyyymmdd
yest01=`$BINDIR/lastdate 1 | /bin/awk '{print $1}'`
# Format MMM
mm=`$BINDIR/lastdate 1 | /bin/awk '{print $2}' | cut -c1-3`
# Format DD
dd=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c7-8`
# Format YYYY
yyyy=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c1-4`
# Format yyyymm
yestmonth01=`$BINDIR/lastdate 1 | /bin/awk '{print $1}' | cut -c1-6`
# ==============
# Date and Month
# ==============
REP_MONTH=`date "+%Y%m"`
REP_DATE=`date "+%Y%m%d"`
FIRST_DAY=`$BINDIR/lastdate | /bin/awk '{print $1}' | cut -c7-8`

chktape()
{
status=`/bin/mt -f $TAPE_DEVICE_APPEND status | grep 'sense' | awk '{print substr($1,1,5)}'`
}

tarcmd()
{
# Check the $TMP_LOG file if exists, if yes remove the file
if [ -f $TMP_LOG ]; then
	rm $TMP_LOG
fi
#/data1/ACCESS
#All Servers
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/ACCESS/$yestmonth01/*$yest01*.tar.Z >> $LOG 2>>$TMP_LOG

if [ $FIRST_DAY = "01" ]; then
	#/data1/BACKUP
	# All Servers 
	/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/BACKUP/$yestmonth01/*_$yestmonth01.* >> $LOG 2>>$TMP_LOG
fi

#/data/BACKUP
# LNPDB, RPT
#/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/BACKUP/$yestmonth01/*$yest01*.tar.gz >> $LOG 2>>$TMP_LOG

#/data1/SYS
# All Servers
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/SYS/$yestmonth01/$yest01*.Z >> $LOG 2>>$TMP_LOG

#/data/TDR2
# PGW, RPT
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/TDR2/$yestmonth01/$yest01*.*.TDR2.Z >> $LOG 2>>$TMP_LOG

# /data/TDR2_M
# PGW
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/TDR2_M/$yestmonth01/$yest01*.TDR2_M.Z >> $LOG 2>>$TMP_LOG

# /data/IPMP
# LSMS, PGW, LNPDB, RPT
/bin/tar cvf $TAPE_DEVICE_APPEND $BACKUPDIR/IPMP/$yestmonth01/IPMP.log.$yest01.*.Z >> $LOG 2>>$TMP_LOG
}

# Main
status=""
chktape
if [ "$status" = "sense" ]
then
	# Got tape inside
	# Clear alarm
	$SCRDIR/gen_alarm.sh 21 0 "$newhost No Tape detected"
	# Start doing backup
	echo "" | /bin/tee -a $LOG
	echo "Tape Backup Start ($datestr $timestr) ..." | /bin/tee -a $LOG
	echo "" | /bin/tee -a $LOG
	tarcmd >> $LOG
	if [ $? -ne 0 ]
	then
		# Put the stderr message generated from tarcmd() into the $LOG file
		cat $TMP_LOG >> $LOG
		echo "" | /bin/tee -a $LOG
		echo "Tape Backup was failed" >> $LOG
		$SCRDIR/gen_alarm.sh 23 1 "$newhost Tape backup error"
	else
		# Put the stderr message generated from tarcmd() into the $LOG file
		cat $TMP_LOG >> $LOG
		echo "" | /bin/tee -a $LOG
		echo "Tape Backup Succeeded" >> $LOG
		$SCRDIR/gen_alarm.sh 23 0 "$newhost Tape backup error"
	fi

	EOT=`cat $TMP_LOG | grep 'tar: write error: unexpected EOF' | wc -l`
	if [ $EOT -ge 1 ]; then
		# Tape Full
		# Raise alarm
		$SCRDIR/gen_alarm.sh 22 1 "$newhost Tape full"
		echo "" | /bin/tee -a $LOG
		echo "Tape Full ($datestr $timestr) ..." | /bin/tee -a $LOG
		echo "" | /bin/tee -a $LOG
	else
		# Tape NOT Full
		# Clear alarm
		$SCRDIR/gen_alarm.sh 22 0 "$newhost Tape full"
	fi

	echo "" | /bin/tee -a $LOG
	datestr=`date +%d-%m-%Y`
	timestr=`date +%H:%M:%S`
	echo "Tape Backup End ($datestr $timestr) ..." | /bin/tee -a $LOG
else
	# No tape inside
	# Raise alarm
	$SCRDIR/gen_alarm.sh 21 1 "$newhost No Tape detected"
	echo "" | /bin/tee -a $LOG
	echo "Tape Backup Fail ($datestr $timestr) ..., unable to find tape" | /bin/tee -a $LOG
	echo "" | /bin/tee -a $LOG
fi

# Housekeep the tape_bk_yyyymmdd.log
hk_month=`$BINDIR/lastdate 121 | /bin/awk '{print $1}' | /bin/cut -c1-6`
\rm $LOGDIR/TAPE/$currdate/tape_bk_$hk_month*.log
find $TEMP -name 'tape_backup.tmp' -exec rm {} \;
	tarcmd >> $LOG
	if [ $? -ne 0 ]

This bit looks very dodgy. I can't see anything in the function tarcmd which would cause it to exit with a non-zero status. If you want to test $? you need to detect the error inside the function immediately it happens, then issue a return 1 .

Conclusion. The error handling for tar failures in the function tarcmd is not present at all.

Thanks for your useful advice, methyl :slight_smile:
Btw, when i tar a dummy file(4824483840 bytes) into a blank tape on the solaris server.

provadm@BDK_OLP_NPOLP_01:./tmp%tar cvf /dev/rmt/0n /tmp/masterfile
a /tmp/masterfile 9422820 tape blocks

What is 9422820 referring to?
Is this 9422820 bytes of the masterfile filled in the tape?

Those are 512-byte blocks..

Thanks Scrutinizer for your answer :slight_smile:

I have another question here.

Just now I tar a dummy file into the existing blank tape on the solaris server:

provadm@BDK_OLP_NPOLP_01:./tmp%tar cvf /dev/rmt/0n /tmp/masterfile
a /tmp/masterfile 2000000 tape blocks
tar: write error: I/O error

Later on, I checked the tape status as follows:

provadm@BDK_OLP_NPOLP_01:./tmp%mt -f /dev/rmt/0n status
HP DAT-72 tape drive:
   sense key(0x3)= Media Error   residual= 0   retries= 0
   file no= 0   block no= 0

Media error, what does it implies?
The tape is corrupted?

The tape error could just be because it's a blank tape.