I have a reqirement to serialise various rsh scripts that hit my server from an external scheduler. No matter how many scripts come via rsh, only one should execute at a time and others should wait.
I have made the scheduler make a request to my shell script with the command to be run as a parameter. This shell script will be responsible to queue the commands and execute one after the other.
The following is the code that I have written. This runs just fine for a few jobs, but as the number of jobs getting queued increases, the script fails. I am running on Ubuntu 8.4 and ksh.
Can anyone please tell me if there is anything that is obviously wrong? I know my request seems like a review of my code, but I would be greatful is anyone can share any simiar code that I could readily used.
I have read another similar post where various alternatives are suggested, but those will not work for me.
#!/usr/bin/ksh
pid_line=$$_$1
queue_file=/tmp/queue_file
# sleep for no reason for a random (5) seconds
sleep $(echo $$%5|bc)
# make your entry into queue at the end
echo $pid_line >> $queue_file
# while you are not the top most job loop
while [[ $(grep -n $pid_line $queue_file | cut -d: -f1 ) -ne 1 ]]
do
# if by any rare chance someone removed you from the queue, join back
if [[ $(grep $pid_line $queue_file | wc -l) -eq 0 ]]
then
echo $pid_line >> $queue_file
fi
# if the current top of the queue job is killed or terminated before it could remove itself, remove it
curr_pid_line=$(head -1 $queue_file)
if [ $(ps -ef | grep $(echo $curr_pid_line|cut -d_ -f1) | grep -v grep | wc -l ) -eq 0 ]
then
sleep 1
grep -Ev $curr_pid_line $queue_file | cat > $queue_file
else
# if someother job is running, wait for sometime before attempting
sleep 20
fi
done
# you are the first job now, run yourself now
$1
result=$?
# remove yourself from the queue
grep -vE $pid_line $queue_file | cat > $queue_file
# exit with the exit code of the command that you ran
exit $result
Your script isn't protected against simultaneous execution of critical areas. You need atomic lock operations to make sure one instance isn't going to modify your queue_file while it is processed by another. Scripting is probably not the best approach to do that kind of thing. Moreover, the "grep something file | cat > file" seems bogus to me. "file" will be erased before being read. This is at least what it does with ksh on Solaris.
I agree that inspite of my logic to re-add accidentally removed jobs back to the queue, there could be some problems.
But, I think this is what I have to go ahead with given the fact that there will never be more than 20 jobs at any given time - and anything more than this will be an overkill for the functional requirement that I have.
I am thinking writing some C program with named pipes when I have more time and money.
I am surprised that there is no easy way of doing this on Unix.
edit by bakunin: i provided the code-tags you surely have just forgotten - no problem, but please bring them with you the next time. Thank you.
This command works "by accident". Its behavior is undefined. testfile might be cleared before being read depending on unpredictable factors.
What makes you believe there is not ? Unix bundles a job scheduler and "at -q xx" with xx as a custom queue with simultaneous jobs forbidden looks suited for this task although I didn't really tested that approach.
To be honest i think that the whole design is flawed - not only the pipeline into itself, as jiliagre has already pointed out (and rightfully so, i might add).
A better way to do this would be to use the filesystems ablilty to sort by date and instead of maintaining a file maintain a directory with (dated) filestamps to manage the jobs. Here is a sketch of a solution i think might work:
#! /bin/ksh
typeset workdir=/path/to/dir
typeset myself=$$
typeset action="$1" # the job gets passed from outside
touch ${workdir}/job.${myself} # enqueue the job
while [ "$(ls -rt ${workdir} | head -1)" != "job.${myself}" ] ; do
sleep 5 # wait until our job is the oldest in the directory,
# which means we are the first in queue
done
# now that we are the first one do the action:
$action # do whatever the job is supposed to do
rm ${workdir}/job.${myself} # remove ourselves from the queue
exit 0