Difference in file sizes being copied by scp. HELP!

Hello All,

I am transfering a gzipped file from LINUX to LINUX using scp -C comand.
It is a nightly job, called by crontab. After copy finishes, the file sizes are different between source and destination. Say .gz file is 14782805941 bytes on source and 13496172544 bytes on destination. When I gunzip it I get an error message "unexpected end of file'.
However, when i type EXACTLY the same command from prompt, scp runs ok and both files are the same size.
I run md5sum check against both files.
Everything is ok when I start scp manually, and I get corruption when same command is ran by crontab.

What could that be?? I really really need to fix it because it is actually the database backups, and untill I fix this problem i have to database backup!

Thanks in advance!

Did you check without enabling the compression i.e without the -C switch with SCP? Can you post the crontab entry for this?

Is it a root cron?

To admin_xor:

Here is my entry:

20 3 * * 1,2,3,4,5,6 /data05/oradata/dpdump/cpFromDB.sh > /data05/oradata/dpdump/cpFromDB.log  2>&1

cpFromDB.sh

scp -C root@xxx.xx.xxx.3:/data05/oradata/dpdump/*.gz /data05/oradata/dpdump/.
ssh -l root xxx.xx.xxx.3 rm -f /data05/oradata/dpdump/*.gz /data05/oradata/dpdump/db*.log

And I am going to test without the compression right now.

Thanks!

---------- Post updated at 10:26 AM ---------- Previous update was at 10:08 AM ----------

I would think so, since the job is running as root.

Have you considered rsync? More checks and balances than using scp.

(Only asked about root in case there was a different ulimit).

Have you tried putting a say sleep 30 between the scp and the ssh ? I'm serious (the second command deletes the source file and the slave process could still be running).

I agree that the compression is pointless on a compressed file. Also that rsync is preferred if both computers support the command.

I just ran a test, I used cron to SCP a file twice - with and without the -C switch. Both runs were ok, and the file sizes are equal. However, this test file was much smaller than the real one. The real one was 15GB, while the test file was only 30MB.

So, could it be that the size of a file is causing a problem?

I might be able to run the large file only over the weekend, other wise it is getting in a way of our backup process.

---------- Post updated at 11:58 AM ---------- Previous update was at 11:54 AM ----------

------------
I will put sleep in the code. It will run tonight, and result I will see Monday morning. It could be a very good catch, deletes before the previous process finishes.

In a manner of speaking. The longer the file, the longer the transfer takes, the longer it takes for anything processing the file to properly handle it, the more likely any sort of problem will cause the connection to break before transfer completes.

What's your network topology like between the two computers?

How long does the full-sized job take? I've seen large copies fail because the connection gets timed out after 2 say hours because an intervening firewall has blocked UDP "keep alive".

To copy that 15GB file takes about 10 hrs.

I would think if that file is 15GB that you are going to need a lot longer sleep time before the rm -f is ran.

I take it there are two processes here -- a local one which copies the files over, and another on the server, which runs unseen, and extracts the .gz files into logs.

You shouldn't be mixing and matching like that when they're both working in exactly the same folder. Either do it from one side, or the other side, not both. There will inevitably be mistakes when one or the other starts at exactly the wrong time.

If it had to be split, what I would do is this:

  • Local program: Upload files into /data05/oradata/tempfolder/ or somesuch. Once the upload's complete, move them into /data05/oradata/dpdump/. Incomplete uploads will never appear in the final destination.
  • Remote program: Extract gz files in /data05/oradata/dpdump/, then delete the gz.

As an aside: Why is this running as root? All it's doing is operating on files, there's no reason to use root for that. That's dangerous.

Yes. I figured I am missing 1.5GB of data, and it would take about an hour to copy. So I put sleep 80min. I'll see result on Monday

I repeat: This is a bad scheme to use here. Either have one end control both operations so that they never run simultaneously, or never put partial files where they might get mangled -- only put in complete ones.

I "inherited" the crontab from an admin who is no longer here. Since I am not a sysadmin, I try to avoid making too many changes on the system. As far as I know, to run scp from crontab I (as a user) have to be setup on the other machine. I could probably run as oracle or some user. I will test it.

Fair enough, but you may have inherited something rather junky. If you have to put in sleeps "in case it's not done yet", something's very wrong with it.

Sorry, I'm not sure I understand what you mean by
"I take it there are two processes here -- a local one which copies the files over, and another on the server, which runs unseen, and extracts the .gz files into logs"

My crontab runs:
20 3 * * 1,2,3,4,5,6 /data05/oradata/dpdump/cpFromDB.sh > /data05/oradata/dpdump/cpFromDB.log 2>&1

And here is my script:

cpFromDB.sh

scp -C root@xxx.xx.xxx.3:/data05/oradata/dpdump/*.gz /data05/oradata/dpdump/.
ssh -l root xxx.xx.xxx.3 rm -f /data05/oradata/dpdump/.gz /data05/oradata/dpdump/db.log

So, scp is the one that "a local one which copies the files over", but where is the other one that "on the server, which runs unseen, and extracts the .gz files into logs"? Is it ssh line?

No problem with running such a copy as root. We need the root ulimit for this size of file transfer.

Hmm. 10 hours to copy 15 Gb suggests something like a 4 Mbits/sec leased line rather than say a Gigabit local LAN connection. Maintaining a 100% reliable WAN connection across a leased line for 10 hours is a dream. The rsync command across a slow link is a lot better at dealing with network glitches than scp .

Have you considered a local (relatively quick) backup followed by a slow secondary backup of that backup to the remote site utilising rsync ?

I was confused for a while, pardon me. I got the source and destination mixed up.