Serializing script Failing for more commands

I have a reqirement to serialise various rsh scripts that hit my server from an external scheduler. No matter how many scripts come via rsh, only one should execute at a time and others should wait.

I have made the scheduler make a request to my shell script with the command to be run as a parameter. This shell script will be responsible to queue the commands and execute one after the other.

The following is the code that I have written. This runs just fine for a few jobs, but as the number of jobs getting queued increases, the script fails. I am running on Ubuntu 8.4 and ksh.

Can anyone please tell me if there is anything that is obviously wrong? I know my request seems like a review of my code, but I would be greatful is anyone can share any simiar code that I could readily used.

I have read another similar post where various alternatives are suggested, but those will not work for me.

#!/usr/bin/ksh

pid_line=$$_$1
queue_file=/tmp/queue_file

# sleep for no reason for a random (5) seconds
sleep $(echo $$%5|bc)

# make your entry into queue at the end
 echo $pid_line >> $queue_file

# while you are not the top most job loop
 while [[ $(grep -n $pid_line $queue_file | cut -d: -f1 ) -ne 1 ]]
 do

# if by any rare chance someone removed you from the queue, join back
 if [[ $(grep $pid_line $queue_file | wc -l) -eq 0 ]]
 then
     echo $pid_line >> $queue_file
 fi

# if the current top of the queue job is killed or terminated before it could remove itself, remove it
 curr_pid_line=$(head -1 $queue_file)
 if [ $(ps -ef | grep $(echo $curr_pid_line|cut -d_ -f1) | grep -v grep  | wc -l ) -eq 0 ]
 then
    sleep 1
    grep -Ev $curr_pid_line $queue_file | cat > $queue_file
 else
# if someother job is running, wait for sometime before attempting
  sleep 20
 fi
 done

# you are the first job now, run yourself now
 $1
 result=$?

# remove yourself from the queue
 grep -vE $pid_line $queue_file | cat > $queue_file

# exit with the exit code of the command that you ran
 exit $result

Your script isn't protected against simultaneous execution of critical areas. You need atomic lock operations to make sure one instance isn't going to modify your queue_file while it is processed by another. Scripting is probably not the best approach to do that kind of thing. Moreover, the "grep something file | cat > file" seems bogus to me. "file" will be erased before being read. This is at least what it does with ksh on Solaris.

That command works with ksh - I am hoping it will work on Solaris also.

$ cp /etc/passwd testfile
$ grep -E abcd testfile | cat > testfile
$ cat testfile
abcd:x:100:100:ABCD,,,:/home/abcd:/bin/bash

I think the critical sections are those that are removing and adding lines to the queue.
That is,

echo $pid_line >> $queue_file 

and

grep -vE $pid_line $queue_file | cat > $queue_file

I agree that inspite of my logic to re-add accidentally removed jobs back to the queue, there could be some problems.

But, I think this is what I have to go ahead with given the fact that there will never be more than 20 jobs at any given time - and anything more than this will be an overkill for the functional requirement that I have.

I am thinking writing some C program with named pipes when I have more time and money.

I am surprised that there is no easy way of doing this on Unix.

edit by bakunin: i provided the code-tags you surely have just forgotten - no problem, but please bring them with you the next time. Thank you.

This command works "by accident". Its behavior is undefined. testfile might be cleared before being read depending on unpredictable factors.

What makes you believe there is not ? Unix bundles a job scheduler and "at -q xx" with xx as a custom queue with simultaneous jobs forbidden looks suited for this task although I didn't really tested that approach.

To be honest i think that the whole design is flawed - not only the pipeline into itself, as jiliagre has already pointed out (and rightfully so, i might add).

A better way to do this would be to use the filesystems ablilty to sort by date and instead of maintaining a file maintain a directory with (dated) filestamps to manage the jobs. Here is a sketch of a solution i think might work:

#! /bin/ksh

typeset workdir=/path/to/dir
typeset myself=$$
typeset action="$1"    # the job gets passed from outside

touch ${workdir}/job.${myself}     # enqueue the job

while [ "$(ls -rt ${workdir} | head -1)" != "job.${myself}" ] ; do
     sleep 5     # wait until our job is the oldest in the directory,
                    # which means we are the first in queue
done

# now that we are the first one do the action:

$action   # do whatever the job is supposed to do

rm ${workdir}/job.${myself}  # remove ourselves from the queue

exit 0

I hope this helps.

bakunin