How to ignore incomplete files

On Solaris & AIX, suppose there is a directory 'dir'.
Log files of size approx 1MB are continuously being
deposited here by scp command. I have a script that scans
this dir every 5 mins and moves away the log files that
have been deposited so far.

How do I design my script so that I pick up *only* those
files that have been completely deposited. For example,

[/mylogs] $ ls -l
-rwxr-xr-x 1 root bin 1124124 Jan 9 02:26 log3225
-rwxr-xr-x 1 root bin 1092534 Jan 9 02:33 log3228
-rwxr-xr-x 1 root bin 1130932 Jan 9 02:39 log3230
-rwxr-xr-x 1 root bin 369644 Jan 9 02:46 log3235

the file 'log3235' has not completely been deposited yet.

  • We are using rsync to syncronise this directory to another 4 server, we don't want to copy the incomplete list. Is there any way to ignore those.

Any help will be much appreciated.

Kind Regards

For solving your problem, 3 solutions come to my mind:

1) If you are ABSOLUTELY sure that all the transferred files are greater than, for example, 1000000 bytes, you can easily filter out only the files you're interested in with a simple ls/awk script which checks the file size.

2) You can check if the files are in the middle of the transferring by issuing the "fuser" command over every file and check if there is one or more process accessing it. If so, the examined file is incomplete.

3) You have to transfer an empty "flag" file after the real data file has been transferred to the destination. In this manner you can pick up only the files which will have a corresponding flag file and ignore all the others. I think this is the best and reliable solution ( or at least, the one I prefer and regularly adopt in doing things like this :slight_smile: )

Thanks for the details. We have thought about all these options... Since we do scp using wild card, option 3 flag option is not possible...

Anyway do you know any option in rsync to ignore the incomple ones when you rsync from one server to multiple ones?

Unfortunately I've never used rsync, but I think no program could determine if a file is incomplete or not. An "incomplete file" is a concept which implies knowing the contents and the meaning of the files involved in the transfer.

I think you could easily implement the third solution even if you use wildcards. Simply "expand" the wildcards before sending and generate a flag file for every entry. Then, after rsyncing the data files (with wildcards) you have to copy all the flag files.

You could for example generate empty files called:

log3225.flag
log3228.flag
log3230.flag
...

and so on, and then transfer all the *.flag files to the remote site.

Hi.

I would let rsync handle the details. If a file is "incomplete" in one period, then rsync will copy as much as it can. Then chances are good that it will be complete in the next, and rsync will finish it.

My understanding of the design of rsync is that it transfers a minimum of data, so that you'll be transferring about the same amount of data regardless of what rsync does - transfers it all at once or in pieces ... cheers, drl

That's the problem.. When rsysnc copies the incomplete files, the target server picks up the incomplete file and process.. Which we don't want to do it.. That's why we need some kind of way to stop the incomple file being picked up by rsync!!!

Hi.

Unless there is a prior agreement between the creating process and the rest of the universe, I don't think there is any method to guarantee that a file is "complete".

I would attempt to address this by having the creating process set some completion flag, create an additional "unlocked-now" file, etc. -- similar to perhaps to what robotronic suggested. You might investigate ownership of the files, placing the files in a holding area until the next one begins to be created and then moving the assumed-to-be-complete file to the transfer directory, writing a wrapper script around the creating process to create the complete signal, and methods along that line.

I'll be interested in additional comments ... cheers, drl