Detecting incoming files without busy polling

Hello,

I'd like to handle incoming (uploaded) files from a shell script, ideally without busy polling / waiting (e.g. running a cron task every 15'). Is there a command that would just sleep until a new entry has been created in a directory, allowing scripts such as the following:

while watchdir $SOMEDIR
do
    # process new files here
done

The "watchdir" command should ideally only wake up when the new entries aren't being used (or at least written to) by anyone, so the file can safely be handled. This will have to run on HP-UX, but I am interested in linux alternatives, or even pointer to the underlying API or system call.

Thanks.

The way I see it, I'm afraid there is no simple workaround for your problem. There is no such function not even in the glibc I guess. To do something like that, you would have to go through every processes' file descriptors (in /proc/onepid/fd for Linux) and check if there is no reference to one file inside that $SOMEDIR of yours.

Nowadays you shouldn't have to worry about those things because programs don't just write to files directly. The way to do it, is writing a temporary file and then rename() it to the location you want. rename() is an atomic function (will never corrupt the file).

In shell scripting, you could use

find location -mmin 1

every minute to see what files were changed in the last minute.

Anyway, wait for more replies, there could be something that at least would get you close to what you want.

When transferring files with whatever protocol and you keep the original date for them, I guess you won't find a difference with "find -mtime".

Maybe a bit more reliable is to make find on all files and pass them with exec or xargs to "cksum", write it to some file and then compare it every interval you'd like.
So if there is any difference, may it be a different sum or a new sum for a file that didn't exist already or a removed one, you should notice that.

Your solution is more reliable but I just agree because it's possible to modify modification times to fool find -mmin. But otherwise, I think uploading protocols do modify modification date when uploading files (even if overwriting). Maybe I misread what you meant.

Linux used to have something like FAM which would alert you when a file was altered. This is being replaced by other mechanisms which are slightly less platform-specific, but there is nothing which is stable and standard just yet. If yoiu have a dbus system, I think that includes this facility, but the name escapes me. There are discussions on this site about such systems so searching along the lines of fam and dbus might help you find stuff you like.

@redoubtable

I didn't thought on modification on purpose. Just if you transfer files via scp and don't use the -p flag to preserve times and modes, you will have different times on the files. With -p you keep it like on the source.
But I have to admit that it depends solely on what baldyeti wants or if it is important at all for him.

Good point. I was not thinking about scp uploads. Baldyeti didn't specify the protocol so I assumed ftp or web upload (p.e. php) which generally modify mtime.

Thanks for your suggestions. Fam/imon is what I had in mind but couldn't remember the acronym. HP-UX doesn't seem to have it anyway. The files will likely be uploaded via FTP. Perhaps a simple "test -r" should do (and block further processing while another program is still appending). I was hoping something more modern and efficent than polling existed in a reasonably standardised way but apparently not. Oh well...

Oops just noticed I wrote chksum. It's cksum^^

Someone already mentioned this, but I'll reiterate... The way this is handled on my production systems is to require a "rename" to some other directory on the same file system. So you UPLOAD the file say to /A/B/UPdir, and then you issue a rename "REN /A/B/UPdir/file /A/B/UPdir/Commit/file" Your script actually polls the commit dir, not the UPdir. That way when a file appears in the Commit dir, you know it is fully uploaded. That solves the "is this file finished uploading" problem, but not the need for polling.