check if file finished to copy

Hi all,

I have a script that is monitoring a hot folder. This script works fine with one exception when the script is executed while a file is being copied to the hot folder.

What is the easiest method to check if the copy file is completed? I'd like to get the solution in bash :slight_smile:

If possible, after copy rename file and check only that name.

    cp somefile  tofile.tmp
    mv tofile.tmp tofile
   # check file tofile and when it's - file is ready because mv only edit content of directory, file
   # is same

I found a way through checking the file size

prev_size=-1 # initialize variable
new_size=`/usr/bin/du -sk "$file_i" | awk '{print $1}'` # get current file size

while [ $prev_size != $new_size ] # repeat until these values are the same
do
/bin/sleep 5 # check every 5 seconds
tmp_size=$new_size # move to intermediate value
new_size=`/usr/bin/du -sk "$file_i" | awk '{print $1}'` # get new file size
prev_size=$tmp_size
done

Anybody sees anything wrong with above script? :slight_smile:

We get asked this all the time. You simply can't tell if a file is done from just looking at it. The upload script has to tell you somehow.

Besides, what if an upload happens while your script is running? It was clear when you started, but halfway through it might screw up. You might have to force it to handle it gracefully.

Can your monitor script be made to skip names beginning with ._ ? The upload script could create it as ._filename and rename it to filename when its done. This will also help you find broken uploads.

i have no control over incoming files. Different users will connect to this server and copy files into my hot folder.
Since clients and server are within the same subnet, so speed is not an issue, i was focusing on file size... are there any downsides to monitor file size?

I have used sum before so something like this
x=`sum $<file name> |awk '{print $1}'`
So depending on what is needed you can put it to a var then loop and do the sum again and see if it changed.

The "cksum" is safer than the "sum" command in this context. If you have Linux rather than unix there are more different commands.

The best simple design option is to send files under a temporary filename and rename them after successful transmission.
Another technique is to send a suitable "finished" dummy file after a successful main transmission.

A belt-and-braces approach is to send a uniquely numbered batch header file containing filenames and checksums and after the transmission of files prefixed with names containing the unique batch number send a matching trailer file containing the same information. After checking that the header and trailer exist and agree, the process should then check the checksums of all files in the batch before allowing processing of the data.
It is actually easier to implement than it is to describe.
Btw. This is better than BACS.

If you cannot modify the process then testing whether the file is open with the "fuser" command is one option and others have suggested checking for file growth by various methods.
Checking with "fuser" or testing for file growth is no guard against failed transmissions.

It'd mean having to stop and wait while you check whether the size is still changing. It also wouldn't be reliable, since maybe a file's not done, just lagging.

I suppose, if you ignored files with modification times within the last 5 minutes, that'd work, if you're willing to put up with a lag of 5 minutes before you learn about new files. The smaller you make the time, the higher risk you have of mistaking a slow upload for a finished upload.

Thank you all for your answers, i will explore some of the options presented!
:b: