script to monitor directory

What is the best way for a script to run to monitor a directory for the presence of files and then perform a function afterwords? I was hoping to have it continually run and sleep until it detects that files are present in the directory, then break out of the loop and go on to the next step.

What it does it waits for files to be transferred to it's local directory, then when the files are present it sftp's them out to another host. I already have the sftp script working just need the proper syntax for a loop to go to sleep until it detects the files in the directory. Also the file names and conventions will constantly change so I was hoping more for a 0 byte count compared to anything else besides a 0 byte count for a condition.This is a linux box using bash as the shell.

thanks in advance guys.

Classic mailbox problem. One problem you have is: how do you know the files are really there, as opposed to partly there? Also, after you sftp them out, do you remove them?

Without addressing these problems (above), here's a simple script:

MAILBOX=some_directory
seq=0
while true; do 
   let seq=seq+1   # this might be bash specific
   if  /bin/ls -1 $MAILBOX | grep ^ ; then
      # make sure incoming files aren't mixed up with files-to-be-processed
      mv $MAILBOX $MAILBOX.$seq
      mkdir $MAILBOX
      # your function
      sftp_this_directory $MAILBOX.$seq
      # clean up 
      rm -rf $MAILBOX.$seq
   fi

   sleep 300 # wait 5 minutes
done

One problem you have is: how do you know the files are really there, as opposed to partly there? Also, after you sftp them out, do you remove them?

Hello, thanks for the quick response! You have good points and this has been discussed. We do remove them after transfer. another note is that although the names will change as will the byte size, they will always end in *.dat. One of other problems is as you mention what if the files are detected before they are finished transferring to the host, before they are sent back out?

thanks

What is your if statement actually doing?

thanks

what about using the watch command?

thanks

also in your while statement, when would the condition not be true?
thanks

better yet create a script that runs "one" time to look for (ls) files and use lsof to add logic for files currently being written to, next add a function for what you want to do with files NOT being written to.

Place that script in a users (root?) cron that runs every minute.

example of logic using lsof:

for F in `ls`
do
VAL=`lsof /full/path/to/$F`
if [ -z $VAL ]
then
your_function_here $F
fi
done

Also, using find instead of ls can be better as ls will list directories, and find can be used for full paths.

so it can look like:

for F in `find /my/dir/ -maxdepth 1 -type f`
do
VAL=`lsof $F`
if [ -z $VAL ]
then
your_function_here $F
fi
done

Useful for "eye-balling" the status of a process or whatever. But: No way to trigger a command (automatically) upon a condition, and Do you really want to sit there and watch for files to come in? If you do, then :

watch ls -l $MAILBOX

is fine.

Never. You'd have to kill the process. It's equivalent to removing the while loop and putting the script as a cron job that runs ever 5 minutes.

It's running the pipeline of ls and grep. The grep is looking for any line. If ls finds no files, it outputs no line, and the grep search fails. If the grep search fails, it exits with 1 which (in BASH logic) means false, and so the if condition fails. If the grep finds at least one line, this means there are files and so the if condition succeeds and the THEN portion is executed.

Are there other files in the directory besides these?

There are at least four solutions to this. In the worst case, you can use what is suggested by ddreggors. Here are solutions to try:

  1. Move-after-write. Modify (or configure) the process that places the incoming file. When creating the file, it names the file in a distinctive way (different extension, prefixed with ., different path, etc). After closing, it moves / renames the file in a way your script expects.
  2. Keep control-log file. Modify (or configure) the process that places the incoming file. After closing the file, it appends the name of the file to a control file, kept inside the mailbox folder (as "something.ctl"). Your script will rename this file, and then read it for a list of file names to sftp.
  3. Use flock. This will only work IF the process, which places the incoming files, uses flock(2) on files that it creates. Use the script to use flock (shell command) to each file before it is moved.

ddreggors solution is a bit resource-intensive, and VERY linux-specific, but it will work if the uploading process does not close/reopen the file in between writes AND if it removes the file after failure (if the upload process is interrupted and the file is not completely transferred). It will fail if there are any other benign processes reading the incoming files. Here is a re-write of that solution which is more efficient (ie, doesn't use additional forks):

cd $MAILBOX
LSOF=/usr/sbin/lsof
for f in *.dat; do
  if  $LSOF $MAILBOX/$f ; then
     # trigger action goes here.
     sftp $MAILBOX/$f someone@somewhere:destination/$f
  fi
done

Nice rewrite and great points but I thought I might mention that in the quote above you make it seem like it will NOT work if "the upload process is interrupted and the file is not completely transferred". While that is not what one would want the process will still work, you will just end up with a blank or partially written file. What I mean to say is that the loop will still run and do what it is meant to do but it will preform the actions on a file that is not intact. This would happen with ANY process as far as I know. If the transfer fails then you have a bad file and no way to tell without looking inside manually (vi, nano, etc...) or diff against the original (which you might not have).

Now on the other hand, it would be a very nice feature if the upload/transfer process failed and the file was removed (as you already mentioned).

As to the Linux specific code... my bad, I am from a Linux world and should remember that nix* means *more* than Linux usually. :wink:

I also make the assumtion that this user wants a quick easy to create solution. I base this on very little knowledge granted but there seems to be an air of "quick dirty hack" (not to be taken as a bad thing, just a real world thing:)) written in the initial question.

You're right. I shouldn't have been so critical of your solution. The user clearly stated linux and flock (as a shell-script) is distribution-specific.

Thanks to both of you guys, you brought up some really valid points. After reading your points, I realized that a "quick dirty hack" might not be the best solution as it is prone to numerous errors, mainly the sftp portion kicking off before the files are finished transferring before being sent back out. So I am probably going to add as much logic as possible.

There will always be 4 files ending in *.dat, but the files names will change. They are the only files and after transferring the directory can be totally cleaned. Flock sounds like a good idea, but I 'm not sure if the host that initially sends the files supports flock. I think I can use the lsof and I was also looking into incron and fileschanged utilities. Have you guys ever used these within a script?

thanks

honestly I have never worked with either, but incron sounds like it is a good start:

(taken from Debian -- Details of package incron in sid)

  • notifying programs (e.g. server daemons) about changes in configuration
  • guarding changes in critical files (with their eventual recovery)
  • file usage monitoring, statistics
  • automatic on-crash cleanup
  • automatic on-change backup or versioning
  • new mail notification (for maildir)
  • server upload notification
  • installation management (outside packaging systems)
  • ... and many others

but fileschanged may not help:

(taken from fileschanged)

  • The file or directory that you want to monitor must exist at program start.

I am using rhel 4 so incron is not available. I have isntalled fileschanged and it will monitor the creation of files as well and I have tested it, but then it goes backthe same question before what if the next sftp step starts to process before the files are finished?

thanks

incron rpm build for : Fedora 8. For other distributions click here.

Name : incron
Version : 0.5.5 Vendor : Fedora Project
Release : 1.fc7 Date : 2007-03-13 19:16:38
Group : System Environment/Base Source RPM : incron-0.5.5-1.fc7.src.rpm
Size : 0.27 MB
Packager : Fedora Project < http://bugzilla_redhat_com/bugzilla>
Summary : Inotify cron system
Description :
This program is an \"inotify cron\" system.
It consists of a daemon and a table manipulator.
You can use it a similar way as the regular cron.
The difference is that the inotify cron handles
filesystem events rather than time periods.

for Fedora Core 8

taken from RPM Fedora 8 incron 0.5.5 i386 rpm

maybe the src.rpm can be rebuilt for rhel4

or look at the list at RPM Search incron

nulinux,

You're spot-on about fileschanged, but it can still be used on the directory to detect when new files are starting to arrive -- that way, you don't have to "poll" the directory.

Test the upload process with lsof. See if the upload process keeps the file open. If it does, you're in luck, and you can go with that. Just write a simple script like this:

while true; do 
  test -f $DIR/$FILE || next
  echo -n `date +%s`:
  if lsof $DIR/$FILE ; then
    echo "$FILE is open";
  else
    echo "$FILE is CLOSED";
done

Run it, then upload a file. Look at the output from the script to see if it's open until closed, or open/closed, open/closed, etc. While you are uploading, perhaps suspend the upload (with CTRL-Z, or unplug the net cable) to see if long delays have any effect.

Nice work otheus, I have been playing with that and, here is a test script that you can change as you need.

Some points,

  1. The file must exist so "touch" test1.txt first
  2. You will need to open 2 terminals and open the file you are testing in vi/vim on one and run this in another.
  3. Use sudo or run as root (lsof requires root on RedHat)
#!/bin/bash

# Assuming you will use fileschaned output to get new filename 
# and also assuming that you will want to test THIS process before
# trying to incorperate fileschanged into the process...
# I wrote this as a test of the lsof side of the code.


# Assuming you test from your (not root) home directory and use sudo
# change as needed

WHO=`who am i |awk '{print $1}'`
DIR="/home/$WHO"
FILE=".test1.txt.swp"
CTR=0

while true
do
        # Attempt to sanely exit after 3 runs
        if [ "$CTR" -eq 3 ]
        then
                killall vim
        fi

        # Do you file open/closed logic here
        if [ -f $DIR/$FILE ]
        then
                LSOF=`lsof $DIR/$FILE`
        else
                unset LSOF
        fi
        echo -n `date +%s`:
        if [ -n "$LSOF" ]
        then
                echo "$DIR/$FILE is OPEN";
        else
                echo "$DIR/$FILE is CLOSED";
                exit
        fi
        CTR=`expr $CTR + 1`
done

here is the output:

1219411794:/home/ddreggors/.test1.txt.swp is OPEN
1219411794:/home/ddreggors/.test1.txt.swp is OPEN
1219411794:/home/ddreggors/.test1.txt.swp is OPEN
1219411794:/home/ddreggors/.test1.txt.swp is CLOSED

Noticing the epoch seconds in the output "1219411794" is the same on all lines and that this value is reset inside the while loop (so reset for each run) then that shows that 4 passes are done inside the space of a single second. You could probably add a "sleep x" line after the CTR increment line to force it to wait x amount of seconds if you like.