first of all thanks for you help on my last problem, the problem is solved now.
But I have many problem
This time, I transfered a big file, ~3,5 GByte, with ftp from a Sun machine to a linux box, RedHat 7.3. But the file recieved on the Linux Box is corrupt, with smaller files there is no problem, I tested it with a 130 MB file.
On the Linux Box I am using the default wu-ftpd server, with default configuration.
Only the server timeout was set by me to 24h.
I had a problem just like that once.. the strange thing that we found was that the disk was broken. fsck didnt report any errors..(If you havent runned fsck on the disk you should try that) but if we did a copy of the corrupted file and did a md5 check against the original it didnt match.. that was one wierd problem.. You could give it a try.. copy the file to the same disk under a diffrent name an d md5 check it.. or make the sun ftp the file to another disk on the linux box and see if the file still gets corrupted.
I don't know it that's already solved, but if not, I wold see if your NICs have "ierrs". Sometimes data corruptions ocurrs in package transmitions. We had some problems here. Here folows an example:
# netstat -ni
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs # netstat -ni
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 127.0.0.0 127.0.0.1 124727 0 124727 0 0 0
ge0 1500 172.19.148.0 172.19.151.85 410543681 95593 221423296 0 0 0
ge1 1500 172.19.148.0 172.19.151.84 259411750 0 161355918 0 0 0
ge2 1500 10.152.231.0 10.152.231.150 258896122 0 496691048 0 0 0
hme0 1500 20.10.1.0 20.10.1.40 781 0 3 0 0 0
In ge0 (gigabit interface) we had some "ierrs" while "ftp'ing". That corrupted data. The FTP did not return any error, and completed successfull. But we had corruption problem. As we have two Nics in the same Network (Sun's IPMP thing) I forced my FTP connection to the other NIC (ge1) and then the data corruption problem was gone. Maybe this would help.
1) have you tried to Gzip or tar the file before transfer?
2) have you tried to copy it to another host to check if it is a platform issue?
3) Can you establish an rlogini/rcp relationship and do a direct copy from one to the other?
4) If possible can you perform a reset on the NIC card to see if that may clear any errors? Of course this may interrupt normal dataflow while it is resetting.
today I received the answer from our development related to the data corruption problem.
The solution is XFS, our used XFS has a bug, but see below the detailed answer :
--- snip ---
We have now found the cause of the data corruption.
This is original bug from XFS.
The special timing of the delete operation of a block and
flush operation from the memory leads to the data corruption.
To be prepared for next I/O operation, XFS pre-allocates the data
on the memory larger than what is actually needed.
Sometimes also unnecessary block are removed from pre-allocated data
on the memory to efficiently use the memory resources.
This happens when memory resource are fully utilized.
When this "pre-allocation" and "removal of unnecessary data"
occurs with certain timing, next data to be processed is written
in the area where just removed from the pre-allocation which shouldn't
happen. Since the data is written into the wrong place, the block
in the memory which should contain the data does not contain the
actual data which causes the data corruption when it is flushed to
Following conditions are needed to have the bug to aprear:
Memory is fully loaded
I/O is done via NFS
Large file is being written continuously and concurrently
(magnitude of GB)
--- snap ---
may be someone has a similar problem, so thanks a lot to all and have a nice day.