Script to Exclude Files That Still On Transfer..

Hi.

I want to schedule a job at some directory will several files in it.

But there maybe a situation whereby some of the files at the point of the schedule are still transferring during that time.

So I want to skip those files from being processed.

Two method that come to my mind:

  1. Process only files that its modification time is older than the last 15 minutes.
  2. Exclude files that have changes in bytes (transferring)

But both method also I don't have the idea how to start.

So appreciate the UNIX expertise here to suggest what's the best way to solve this kind of situation.

$ uname -a
HP-UX system1 B.11.31 U ia64 0189138652 unlimited-user license

lsof not available.

Thank you.

You don't mention what OS you are running on.

If you OS has lsof installed you may be able to do something like:

if lsof -- "$PROC_FILE" > /dev/null
then
   echo "$PROC_FILE is busy - probably still being transferred"
else
   # code to process the file goes here
fi
1 Like

Thanks Chubler.

$ uname -a
HP-UX system1 B.11.31 U ia64 0189138652 unlimited-user license

I've tried lsof yesterday. It's not available in my system.

Thanks.

You could use perl to fetch the file's modification time and compare it to the current time like this:

PROC_FILE="/var/dump/trn_20163002.24"
FILETIME=$(perl -e 'printf "%d",((stat(shift))[9])' "$PROC_FILE")
NOW=$(date +%s)

if ((NOW - FILETIME > 15*60))
then
   #file is older than 15min - so process
fi

Edit: Not sure if perl will be in your PATH try /usr/contrib/perl if it's not found.

1 Like

Yes, you could use 'mtime' to exclude these files but there are many ways of doing this and I'm sure you'll get a number of ideas posted here.

Having done this kind of thing countless times my preferred method is to create a timestamp file at the end of each run:

date > timestamp

so that the inode of that file holds the timestamp when the last run ended.

Then on the following run I do:

find . ! -newer timestamp ...............

to select all files NOT newer than that timestamp.

If for example this is a cron job running every 15 minutes, using this method ensures that if a run is missed for some reason (eg, system down), the next run but one will pick up the backlog.

This method can create a 'moving window' behind the 15 minute allowance for incoming transfers to complete without the chance of selecting a file still being written to.

You then overwrite the timestamp file at the end of each successful run ready for next time.

(Of course, you may need a mechanism to prevent the selected list just getting longer and longer on each run by perhaps moving the files transferred elsewhere or deleting the files successfully transferred, but that wasn't what you asked.)

That's my suggestion but without doubt others will do it differently.

Hope that helps. Perhaps it doesn't.

1 Like

Thanks so much.

It works as expected! Really appreciate. :b:

Thanks.

---------- Post updated at 10:26 AM ---------- Previous update was at 10:25 AM ----------

Thanks for your input as well.. :b: