Building a better mouse trap, or How many lines of code does it take to trap a mouse?

Hello all,

I'm hoping to get a little insight from some of the wily veterans amongst you.

I've written a script to check for new outgoing files to our vendors located on our ssl server. It seems to be working ok, but the final question here, will be one of logic, and/or a better way to do it.

First a little background; The program is run every 5 minutes from cron. The files are uploaded via NFS or CIFS. So file dates can't be fully trusted. So, I use find -cmin for the dates. Files remain on the server for 10 days.

Process;
1) Check for PID file. If PID file exist, exit. (program still running) If not generate PID file.

2) Check filesystem size for changes since the last run. If no changes, clean up PID file and exit. (No new files) If changed sleep 1 minute. (File(s) may still be transferring) Loop until changes stop. Add total sleep time to find time. Continue to step 3 (Transfer done)

3) Using the find command. Build a file containing the list of new files in ftp directory newer than specified cmin time.

4) Filter through the file built in step 3. Generate email for each vendor with file names and send to contact for vendor.

5) Clean up PID file. Copy stat files to backups for comparison on the next program run. exit.

Like I said, this is working, but a few files slip through the cracks.

What I would like to know is: If you have any thoghts on better ways to do this.

One Idea I've been looking into is:
Generate a full file list every 5 minutes and use diff to generate the outgoing file list?

Also, This started out as a small server. So, checking for filesystem changes was no problem. Now I have roughly 180 vendors accessing the site. With all the changes to the filesystem size the program will somtimes run for 15 - 20 minutes. Regardless of how the list is built. I would think that once it is generated I could just check file sizes on those files for changes. Once they finish transferring, generate the mail, and wait for the next go-round to pick up additional files.

So what's the general consensus? Thoughts, Ideas, Opinions?

Thanks in Advance,
MPH

I'd rather have a bottle in front of me, than a frontal labotomy.

Not sure that I understand. Is this one directory or a directory tree? How the the files get removed? Anyway...

I would loop through all the files getting name and size (if date cannot be trusted, ignore it). Add name and size to a little database somewhere, timestamping this addition. Or if the entry is present, update size and timestamp. Then loop through database and find entries with old timestamps; process these; remove from database and directory (removal not possible? --- mark as processed in the database.)

Perderabo,

This is a directory tree /ftp. Under this there are the users and their incoming and outgoing directories. Each user has their own directory for security reasons. Our customers don't want their data availible to the wrong vendors.
Files get removed by another daily cron job that finds files older than 10 days. The date can't be trusted as far as how many minutes old they are. So, find works fine for removing old the files. If they're transferred via CIFS it holds the creation date previous to the transfer. That's why I use the -cmin. It seems to work well and uses the access time of the transfer. But I think that's where some files fall through. I had to setup ntp on the server due to clock variations between the server and the clients causing problems with file times. Another reason to use the "find all files and diff them" logic.

This is simular to what (I guess) I was trying to say with the idea I was looking into. That is to say, find all the files under /ftp/*/outgoing and diff them for additions against the file list built 5 minutes ago. Using the diffed file names, the "database" would simply be a temp file containing the name and size. Grep for the file, awk the $NF for the size and compair till they're the same, sleeping for bit between checks to avoid frantic looping. When the run is finished delete the temp database. Removed files won't be an issue, since I'm only looking for added files between runs. If the file reapears, there's usually a good reason for it (corrupted IGES files, etc...) and the vendor should be re-notified.

I hope this makes sense. My fingers are too well connected to my brain.

:confused: Hmmmm... I gotta learn to learn to leave these hyper-abstract problems alone.

:rolleyes: I knew I shouldn't have gone to the picasso school of communication :frowning: