Use of flock command for whole script

rbatte1 · September 29, 2016, 8:22am

I'm changing my mindset from a few big processes moving data from a few sources under an external, dependency-based scheduler to multiple processes moving data from many sources run by each client cron and possibly interfering with each other. It has the benefits of more granular code but I'm worried about tripping myself up. The incoming data will be delivered (atomically) into a known directory, but we cannot predict when so we have the script to process & remove them run every few minutes. The problem is that it could run a long time if we have a huge volume of data. I am therefore looking at locking mechanisms.

The flock manual page refers to wrapping up commands in parenthesis , redirecting a file descriptor to a file, something like this:-

(
flock -s 200
    # ... commands executed under lock ...
) 200>/var/lock/mylockfile

It has been suggested elsewhere that one could use braces { & } to acheive the same and avoid spawning another process. Sadly, both of these would make my code a little ugly because I want an exclusive lock in force for almost the whole script. It's not massive (only about 600 lines) but it adds an indentation I'd rather not have and I'm torn because I want to be good about indenting code too.

Additionally, whist debugging, I don't want to have to worry about where the end of the block of code is and generating very confusing syntax errors.

I have discovered that this seems to work just as well and doesn't leave me feeling I should be indenting the code wrapped in parenthesis/braces or worried when looking to grow/debug:-

#define any functions first

# Set/test lock
exec 200>/var/lock/mylockfile
flock -w 1 -x 200 || exit                 # Do not run concurrently

# My main script starts here
sleep 10                                  # Just for illustration, honest!
# My main script ends here

flock -u /var/lock/mylockfile             # Is this necessary?

The script is simply run by cron and I don't want more than one at a time, hence the exclusive lock. There would never need to be output written to the lock file.

So, my questions:-

Am I breaking the rules here or leaving myself open to an unexpected failure?
Do I need to close file descriptor 200, given that the script just ends?
Do I even need to bother unlocking at the end of the script?
Is there a problem of time between the exec and the flock?

Thanks, in advance,
Robin

jim_mcnamara · September 29, 2016, 9:06am

Hmm. The idea of flock is to wrap a single (usually) command within the duration of the lock. If the command fails midway, then the caller still calls unlock.

A six hundred line script is doing an awful lot of stuff for the duration of the lock - just a point. There must be a large number of possible points of failure - what happens on a busy system? For your structure to work, you want there to be a single thread of active execution. And you are using the lock as a traffic signal.

So is your lock request at a suitably granular level? I do not know. I'm guessing no.
But you do have to unlock no matter. You probably need a simple master gate minding script to take care of minding the lock, and calling your 600 line beastie.

Juha_Nurmela · September 29, 2016, 9:06am

I think the _exit(2) of the shell is sufficient, no need to explicitly unlock.

However... will you ever leave background processes ? The descriptor would be passed to them, and, depending on the exact locking mechanics the lockf(1) uses, the lock might live on until those processes exit too.

Do you ever delete the lockfile ? There's a possibility of race if so.

Would an utility wrapper like

exec 3> thelock || exit
lockf -w1 -x3 || exit
the_real_job 3>&- &
wait

simplify/beautify things? Note fd 3 is closed instead of passed to the child.

Juha

Meh, I'm an idiot...

&
wait

serves no purpose

rbatte1 · September 29, 2016, 9:40am

Thanks jim mcnamara,

Yes, it is a bit of a beastie but I do want to ensure there is only one of the whole thing running. It's collating data from multiple sources and there could be multiple inputs per source. The script itself runs fine, but could be slow for volumes of data.

Talk about me missing the blindingly obvious solution to stop me worrying. That is indeed a very sensible solution.

Thanks also to Juha Nurmela, good stuff, but sadly these are all C functions and I'm just in the shell. Sorry, I should have said.

If it matters, it's bash on CentOS v6.

Kind regards,
Robin

Juha_Nurmela · September 29, 2016, 9:55am

Surely the shell also uses those system calls.

There are oddities in file locking, see the manpages if you care. One oddity is that if you open a file, lock it, open it again and close that latter fd, the lock is lost.

I got flock and lockf mixed up above, sorry. Here on FreeBSD, and the "scriptable" locker is called lockf.

Juha

Peasant · September 29, 2016, 10:37am

Can you use inotify on linux systems ?
If files are moved as you say atomically into known (existing?) directory, inotifywait should catch the exact moment using moved_to.

Perhaps even eliminating need for locks and cron jobs.

A script from screen perhaps, when moved_to action stops for N <insert your time> , do your magic or something like that.

It should simplify the procedure, but will be limited to linux systems.

Hope that helps
Best regards
Peasant.

jgt · September 29, 2016, 12:06pm

Don_Cragun · September 29, 2016, 3:47pm

As long as you just have a single script that wants to be sure that only one copy of it is running at a time, you can use the shell's no clobber option to create a lock file something like:

#!/bin/bash
IAm=${0##*/}
LockFile="/tmp/$IAm.lock"

# Set lock and verify that no other copy of this code is already running...
set -C	# set no clobber option

# Create the lock file...  This will fail if the lock file already exists.
if ! echo "PID $$ holds the lock." > "$LockFile"
then	echo "$IAm: Another $IAm is already running." >&2
	cat "$LockFile" >&2
	echo "$IAm: Aborting!" >&2
	exit 1
fi

# We now hold an exclusive lock to the lock file (as long as other processes
# wanting to create this lock file also use set -C when trying to create this
# lock file).
set +C	# clear no clobber option

# Set up to remove the lock file when we're done.
trap 'rm -f "$LockFile"' EXIT

echo "$IAm: $$ running."
sleep 60
echo "$IAm: $$ quitting."

This should work with any POSIX conforming shell.

Just be aware that if you use kill -9 to kill this script, you'll have to manually remove the lock file before you can start another instance of this script.

MadeInGermany · September 30, 2016, 12:54pm

Best practice: remove the lock if more than one day old

find "$LockFile" -prune -mtime +0 -exec rm -f {} \; -exec echo Deleted old {} \;

RudiC · September 30, 2016, 1:42pm

If a lock file as proposed by Don Cragun exists, you could also check if the process is still running, e.g. like

read A PID REST <"$LockFile"
if ! lsof -p$PID >/dev/null; then  rm $LockFile; fi

Don_Cragun · September 30, 2016, 2:07pm

rudic:

If a lock file as proposed by Don Cragun exists, you could also check if the process is still running, e.g. like
read A PID REST <"$LockFile"
if ! lsof -p$PID >/dev/null; then  rm $LockFile; fi

On many UNIX systems, /tmp is wiped clean during the boot process. The above code works great in this case.

But, if /tmp is not wiped clean by a reboot on your system, there is a small chance that your script could have been killed by a SIGKILL signal (preventing removal of the lock file), having a system reboot, having some other process be started with the PID your script was using on a prior boot, and then having the above code kill the wrong process. Even though the chance is small, on a system that doesn't clear /tmp on every boot, I prefer manual intervention if a lock file is left around when it shouldn't be there.

I don't know if CentOS V6 normally clears /tmp on boot, nor if you have changed the default CentOS boot sequence to keep /tmp as it was or to clear /tmp as part of your local both sequence.

RudiC · September 30, 2016, 2:13pm

Absolutely. It was just an example that would need serious refinement.