Detecting data loss during FTP

sameerbo · June 17, 2003, 4:53am

Hi,

How we can detect that there has been a data loss during FTP, throught Shell scripting?

I have gone through FTP return codes, but, none indicate that there has been any data loss.

Can we use FTP return code 226 as an indication that during file transfer there has been no data loss? If, yes how good this assumption would be?

TiA,
Sameer.

Perderabo · June 17, 2003, 7:25am

FTP is based on TCP which guarantees reliable data transmission. If a TCP packet is garbled or fails to arrive, the TCP layer will not acknowledge it. That causes the sender to retransmit. The FTP program has no way to know this happens.

If the TCP layer cannot transmit the data, the file will not be transferred. So all you need to do is see if the file was transferred.

sameerbo · June 18, 2003, 7:01am

Hi,

That means the occurance for code 226 is an indication that there has been no data loss.

One of my colleague gave me the suggestion to this that, I should transfer another dummy file of say, 1 byte and I should check the status for both the file. If I get the ok status of both the file than that means that there is no data loss.
According to him this has been done in other project and it has been successful.

Well, I am not quite convinced!! :rolleyes:
If the return code 226 it-self is indication of no data loss than why should I go for transffering another file, and complicate my code?

What do you say? Can this new technique be of any help?

TiA,
Sameer.

RTM · June 18, 2003, 9:37am

Transfering another file is a good idea IF you put the verifiable information into it about the main file you were sending. On a ftp from one UNIX system to another, I put the output of ls -s mymainfile.dat into filesize.dat. Then on the receiving side, the cron job that runs waits for filesize.dat to start processing (this also insures that the main file is completed since it went first).
It gets the information on the size of the file and compares it.

You could (if going from UNIX to UNIX) also use the sum command.

Perderabo · June 18, 2003, 10:50am

The algorithms that sum originally used have been discredited. If you use the sum command, use -p so that you get a crc check. Or just use cksum instead. The crc algorithm is the champ as far as detecting data corruption. That is why the TCP layer is performing a crc on every arriving packet.

The only time I have ever seen the "flag file" used is to know when the last of a collection of files have arrived. Once the flag file appears, you know that you have everything and you can use the data. I do not understand why your colleague's advice in this case. You will have to ask him to explain.

Nor do I understand your fixation on the the 226 return code. I am unable to tell if you think 226 means that file was transferred ok or if you think that 226 means there was a problem. But that doesn't matter, neither view is correct.

According to the rfc:

226 really just means that the data connection is closed. In the case of an abort, it is clear that the file has not been transferred. It is common to close the data connection after a file has been transferred, but it is not required. The data connection is not held open for all eternity under any conditions. If you open a data connection, the time will come that it is closed.

The data connection shall be closed by the server under the conditions described in the Section on Establishing Data Connections. If the data connection is to be closed following a data transfer where closing the connection is not required to indicate the end-of-file, the server must do so immediately. Waiting until after a new transfer command is not permitted because the user-process will have already tested the data connection to see if it needs to do a "listen"; (remember that the user must "listen" on a closed data port BEFORE sending the transfer request). To prevent a race condition here, the server sends a reply (226) after closing the data connection (or if the connection is left open, a "file transfer completed" reply (250) and the user-PI should wait for one of these replies before issuing a new transfer command).

Paying close attention to the arrivial of a 226 would be crucial if you were writing an ftp client. That is not what you're doing.