I have a simple script which copies directory from one place to another and deleting the source .
I am facing a situation when new files gets added when the script has started running. Its resulting in data loss
Please suggest a way to avoid data loss. I googled a lot but most are perl solutions . I am looking for something is shell script.
what ratio of files is being seen when you do 'ls -ltr' as changed. whats the frequency of change and what's the phenomena how often these file are updated
cyclic check on files being transferred would be a better idea. Do you expect the files changing every 1 sec, 1 minute or what. calculate that time's mean value and then start transferring them.
2ndly, you may try some register file, where you can write down the files being transferred so far and then in the next run of the script, avoid them.
mv is safe if source and destination are in the same file system.
On different file systems, it must copy the data like cp .
--
It might help to only copy files with ctime greater than 1 hour.
No offense intended (just honesty), but you're problem statement is useless. Given its utter lack of specificity, I'm surprised anyone invested any of their time in responding to it.
Accurate answers to the following questions will probably lead to a quick resolution:
What operating system you are using?
What are the exact commands used to add files to the current directory?
What are the exact commands used to copy the files to their new location?
Are these two directories part of the same filesystem?
What are the exact commands (if any) that are run as part of any subsequent clean up.
What exactly do you mean by data loss? Are entire files missing? Are you seeing partially complete files? Something else?
For all we know, your problem may be as simple as misusing 'rm -fr' when 'rmdir' is required.
In the future, if you would like accurate, focused assistance, save everyone (yourself included) time and be specific from the start.
As another wild guess here, it sounds to me like ningy is starting to copy files while they are being written and then removes them at the source (while they are still being written). I think the code needs to be modified to be sure the file is complete before the move starts.
But failed while some new files getting written to.
I have just started exploring find command with -newer option to check if new files are added by creating a file just before copy and checking before removing source. Not sure if thats the best thing to do but a start atleast
If I understand what you're saying, it won't solve your problem. You don't need to know if a file is new before you remove it; you need to know that a file is complete before you start copying it. You can only do that by having the server provide some indication that the data in the new file is complete. The client can't reliably know that the source file on the server is complete unless the server provides some unambiguous way to determine that.
What MadeInGermany recently proposed is a big step in the right direction, but there is still no guarantee that the process loading the file being copied will not have been sleeping or "swapped out" while the copy to the client was being processed.
You could easily eliminate all of your headaches if you had a directory on the same filesystem as "source" that was dedicated to files in flight. Since it knows when it's done with a file, the script writing the file should be in charge of mv'ing from the in-flight directory to "source". This way, every file in "source" is guaranteed to be complete.
In my opinion, this is the simplest and most robust solution. Anything else will be either more complicated or less dependable or both.