Need help on rsync

Hi experts,

We need copy 5TB data from one server to another (over a 10Gbps link). We plan to use rsync -av remote:/<path /local on destination server but there're few special requirements like:

  1. data copy process should run only from 18:00 Hrs to 07:00 every day until copy is completed. Is it a good idea to cron 2 jobs - one of them starts script at 18:00 and the other kills it at 7:00?
  2. log should be emailed once in every 60 minutes. I noticed there're lot of redundant unnecessary info in the logs. is there any filter?
  3. can transfer speed be controlled? (to reduce a bit so that it won't choke entire bandwidth)
  4. rsync version on destination machine is 3.0.6. yum update rsync says this is final version.
  5. I think there's no need for rsync daemon just standalone command would suffice. please confirm.

Please advise, Many thanks!!

If you had several directories under the main directory about the same size, you could run several rsync commands at the same time. There is a chance that with a full 10GB connection you can get the files transferred in one night. It depends on how many files and how large each file is. IMHO, you would be better off if you have a smaller number of files that are large compared to a huge number of small files. You should use the compress flag, --compress to speed the transfer of files.

I might not even use cron for the transfer, I would probably just log in when I want the transfer to start and run the rsync with nohup and an & amersand at the end. You can then create a cron job to email the nohup.out file every hour if you want. But the transfer might finish in one night. Of course if you have millions of tiny files, it could take longer. Then just log in before the job is supposed to finish and if it is still running kill it.

But if you manually start it you can know the pid at the start, you can even script the cron job to kill the process at the beginning since you know the pid. Just don't leave the job running because the pid will get reused and you will have a job that is randomly killing any job with that pid.

If you are worried about overloading the network you can also try rsyncing to removable storage, then move the drive to the server where it needs to be. Another option would be that if you are using SAN, you can do a SAN snapshot to create another SAN volume with the same data.