I have a ksh script (dtksh Version M-12/28/93d on Solaris 10) that is run daily by cron and sometime hangs forever. I need to detect if there is an old copy hung before I start the new run, and if so send an email and exit the script. Here is part of the code:
#!/usr/dt/bin/dtksh
PROGNAME=$(basename $0)
req_ver=93d
if [[ ! ${.sh.version} || (${.sh.version##*/} < $req_ver) ]] ; then
print "$PROGNAME requires ksh $req_ver or higher" ; exit 1 ;
fi
count=$(pgrep -z global $PROGNAME | wc -l | awk '{print $1}')
if [[ $count -gt 1 ]] ; then
mailx -s "Houston we have a problem" ...
exit
fi
Here is the problem, and I am really stumped:
Even when this script is the only instance, $count still equals 2.
Debugging shows that:
pgrep emits one PID
wc -l returns " 1 " (Seven spaces, the digit 1, single space. I need to trim the spaces before the test, thus the awk command.)
and then awk returns 2
??? !!! ???
I see the same thing when using these versions of the command:
count=$(pgrep -z global $PROGNAME | wc -l | tr -d [:blank:])
count=$(ps -z global | grep $PROGNAME | grep -v grep | wc -l | awk '{print $1}')
When I test at the shell using these checks against existing processes the result is alway correct. I have checked and double checked and it seems that the digit 1 surrounded by spaces is being piped into either tr or awk, and both return the digit 2 with no spaces.
Help will be really appreciated.