Odd results when my script runs from cron..

Hi folks,

So I wrote a script to run "top", "awk" out values fro the "top" and send the results to a data file.

I then set it to run in cron every 15 minutes.

Now I'm noticing that the script, and it's sub-commands are not always cleanly finishing and, in my investigations, I am also noticing that the script is launching differently than expected from cron. Some of this evidence comes from my ps, which shows processes lingering after completing as shown here:

     UID   PID  PPID   C    STIME TTY         TIME CMD
    root 26793   287   0   Feb 21 ?           0:00 sh -c /usr/apps/client/bin/gatherOSKPI.sh
    root 26853 26796  45   Feb 21 ?        6539:10 /usr/apps/client/bin/top -n 1 -q
    root 26859 26796   0   Feb 21 ?           0:00 /usr/bin/sed s/^[       ]*//;s/[        ]*$//
    root 26857 26796   0   Feb 21 ?           0:00 /usr/bin/awk -F; {print $2}
    root 26796 26793   0   Feb 21 ?           0:00 /bin/ksh /usr/apps/client/bin/gatherOSKPI.sh
    root 26854 26796   0   Feb 21 ?           0:00 /usr/bin/head -n 5
    root 26858 26796   0   Feb 21 ?           0:00 /usr/bin/awk -F: {print $2}
    root 26858 26796   0   Feb 21 ?           0:00 /usr/bin/awk -F: {print $2}
   root 26858 26796   0   Feb 21 ?           0:00 /usr/bin/awk -F: {print $2}
    root 23755 23721   0   Feb 23 ?           0:00 /usr/bin/awk -F, {print $3,$4}
    root 23721 23720   0   Feb 23 ?           0:00 /bin/ksh /usr/apps/client/bin/gatherOSKPI.sh
    root 23748 23721  46   Feb 23 ?        3300:18 /usr/apps/client/bin/top -n 1 -q
    root 23720   287   0   Feb 23 ?           0:00 sh -c /usr/apps/client/bin/gatherOSKPI.sh
    root 23756 23721   0   Feb 23 ?           0:00 /usr/bin/awk -F  {print $1","$4}
    root 23751 23721   0   Feb 23 ?           0:00 /usr/bin/head -n 5

Note that sometimes the script is launched with the flag "-c", other times not, and once preceded by "/bin/ksh"

The awk and head output are due to commands in the script.

Can anyone suggest why this behavior occurs?
Largely, I am being given a chance to advance, but need to find why these processes are not ending once their run is complete

Thanks in advance,

Marc

Posting the script does help

I suggest to source your .profile in your cron entry right before calling script:

* * * * * . ~/.profile; /path_to_your_script/your_script

Sorry,

I did not post the script as I was not sure how much space I had.
Here it is:

# Set Default Paths
#
PATH=/usr/apps/client/bin:$PATH; export PATH
LD_LIBRARY_PATH=/usr/apps/client/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
NOSHOME=/usr/apps/client/bin; export NOSHOME

# Execute the NOS provided top program with CPU MEMORY and LOAD values to be extracted to temporary files
#
$NOSHOME/top -n 1 -q | /usr/bin/head -n 5 | /usr/bin/grep CPU | /usr/bin/awk -F"," '{print $1}' | /usr/bin/awk -F":" '{print $2}' | /usr/bin/awk -F" " '{print $1}' > /tmp/cpu.out
$NOSHOME/top -n 1 -q | /usr/bin/head -n 5 | /usr/bin/grep Memory | /usr/bin/awk -F"," '{print $1,$2}' | /usr/bin/awk -F":" '{print $2}' | /usr/bin/awk -F" " '{print $1","$4}' > /tmp/mem.out
$NOSHOME/top -n 1 -q | /usr/bin/head -n 5 | /usr/bin/grep Memory | /usr/bin/awk -F"," '{print $3,$4}' | /usr/bin/awk -F" " '{print $1","$4}' > /tmp/swap.out
$NOSHOME/top -n 1 -q | /usr/bin/head -n 5 | /usr/bin/grep load | /usr/bin/awk -F";" '{print $2}' | /usr/bin/awk -F":" '{print $2}' | /usr/bin/sed 's/^[       ] *//;s/[        ]*$//' > /tmp/loadavg.out

# Gather all data into single file and clean up
#
CPU=`cat /tmp/cpu.out`
MEM=`cat /tmp/mem.out`
SWAP=`cat /tmp/swap.out`
LOAD=`cat /tmp/loadavg.out`
HOST=`/usr/bin/hostname`
SDATE=`/usr/bin/date +%b-%d-%y`
TIME=`/usr/bin/date +%H:%M:%S`
# Configure date values to figure out proper storage of comma delimited values
#
typeset -i MONTH=`/usr/bin/date +%m`
MONTH=$(echo "$MONTH" | tr ' ')
typeset -i DAY=`/usr/bin/date +%d`
DAY=$(echo "$DAY" | tr ' ')
typeset -i YEAR=`/usr/bin/date +%Y`
YEAR=$(echo "$YEAR" | tr ' ')

# Create a variable FILE to concat variables into a single variable to test against
FILE=$MONTH"-"$YEAR-$HOST".dat"

# Check to see if file is empty. If not, populate the values, otherwise create the needed file and populate
#
if [ -e $NOSHOME/../data/OSKPI/$FILE ]
  then
    echo $HOST","$SDATE","$TIME","$CPU","$MEM","$SWAP","$LOAD >> $NOSHOME/../data/OSKPI/$MONTH-$YEAR-$HOST.dat
  else
    echo $HOST","$SDATE","$TIME","$CPU","$MEM","$SWAP","$LOAD > $NOSHOME/../data/OSKPI/$MONTH-$YEAR-$HOST.dat
  fi

# Remove all temporary files and exit
#
rm /tmp/cpu.out /tmp/mem.out /tmp/swap.out /tmp/loadavg.out
exit 0

I don't see the shebang line in your script. My guess is that you forgot to write it, that's why you see

 sh -c <scriptname>

-- bourne shell is invoked as default interpreter.

Please include the shebang line:

 #!/bin/ksh

at the beginning of the first line and try again
If it still doesn't work as expected, you should tell us whether the script works as it should when you invoke it from command line (with no cron).

What caught my eye is

top running for 6539 minutes i.e. 4,5 days? I guess the piped processes are not lingering but waiting for top to finish?

ok,

First, mirni...
That was my cut and paste error.
The shebang is:
#!/bin/ksh

I just missed it in the cut and paste.

Thanks for pointing that out though!
Marc

---------- Post updated at 11:27 AM ---------- Previous update was at 11:22 AM ----------

Yes, that would be an issue on a production box but we are beta testing the script on a test box as they're only considering allowing me to increase my duties.

What caught my eye is not only the continual run of the top, but the differences in how cron seems to be initiating the script.

Either it with a "sh -c" or it is straight out (as seen in the Feb 21 result)

My first thought is that the top is still running because one of the follow on commands manipulating top's output was encountering an issue. But I realized that would not make sense as top is already done before its output can be piped to the next command.

Then I began wondering if there is something that can gum up the pipe so top can't reach it?

Is that possible?

Marc

What system are you on? My top doesn't have -q switch (running linux).
What does -q switch do?

Does the script run ok when you run it from CL (no cron)?

What exactly is your cron entry?
Try to redirect stdout and stderr of the script to a file for debugging. Also specify the MAILTO and SHELL variable in cron, eg.

MAILTO=<user>@domain.com
SHELL=/bin/ksh
15/* * * * * /usr/apps/client/bin/gatherOSKPI.sh >> /home/user/gather.log 2>> /home/user/gatherErr.log

And see whether you will capture anything interesting.

Next step would be to set the verbose mode on in the script:

#!/bin/ksh
set -x
...

and/or simplify/comment out the lines to try to find the culprit.

mirni

We are running this on Solaris 2.10

The cron command is:
0,15,30,45 * * * * /usr/apps/client/bin/gatherOSKPI.sh

As I had it explained by the admin that was helping me, the -q flag remove delays when top refreshes, so I am guessing it can be removed. I don't use more than a single response from top in my script. I put it there because the admin recommended it.

I'll check on if i can make use of the mailto or verbose methods without interrupting their work. as I said, they are letting me stretch "if" I can learn on my own.

Marc

Are you sure your ps excerpt is correct and complete? I see process 26858 three times, but I am missing the grep command from that pipe. As the top process is hogging CPU time like mad, the -n1 switch seems not to work...
Suspicion: one of the (intermittend) processes that top shows in the first five lines has a character (combination) in it that prevents one of the commands in the pipe to finish and thus keeps the entire pipe open.
btw - that construct of 4 loooong pipes could (and should, for clarity, simplicity, maintainability) be reduced to one single top execution, piped to one single elaborate awk statement, either printing to those tmp files, or, feeding values to the desired variables immediately.

Rudi,

Actually, I cut the ps results down to what I thought was relevant

As for making the awk more efficient, I'm learning as I go here and not sure how I would combine all three items into one. But I know you're right and hope to learn better as I go.

As for the issue, we've cleaned up the script a bit on our end, killed off the lingering processes and the admin I'm dealing with found some environmental issues ( He was not specific ) and restarted the cronjob for this script. So we are in "watch mode" again as he says it may have been an unrelated OS issue after all.

So thank you all for your input, but it appears that my script may have only been to blame until the admins could find out what the real issue was.. :\

Thank you all for your help!

Marc

Just to set you on the right track: That repeated top | head 5 | grep CPU | awk | awk... could be reduced to a single top | awk 'NR> 5 {exit} /CPU/ { ... split (...) ...} /Memory/ { ... split(...) ...} etc.
Did you ever consider the vmstat command (if available on your system) that hands you out the desired info on a silver plate, and can run in intervals as desired?