cp to copy only non-corrupt files

LMHmedchem · December 21, 2011, 12:47am

I don't know if I am asking this correctly, but I have a hard drive with some bad sectors and it appears that some of the data is corrupt. I am having allot of trouble copying the data to a new drive. The issue is not in copying files, but that the new drive to which files are copied is not acting in a stable manner after the files are copied to it. Check disk runs every time I restart, but stops with an error before it finishes. Data on the drive will be good, but after a couple of restarts, the same data will be corrupt and files won't open.

I realize that the problem could be the drive, but it seems more complicated than that. There is only one partition on the drive that is causing problems. There is a second partition on the drive that check disk does not run on.

It would be very helpful if I could confirm that all the files I am copying to the drive are non-corrupted files and skip those that are. I don't know if there is any way to test the files before they are copied. I know that sometimes you can't change permissions on corrupt files, or can't open them, but I don't know how that helps.

Suggestions would be appreciated.

LMHmedchem

Corona688 · December 21, 2011, 12:56am

Ordinary files don't start malfunctioning just because you're copying them from a dying drive. Their contents may be suspect, but they're not magic; their badness doesn't leak into the filesystem at large. Bad files don't have the power to corrupt good filesystems when copied.

This means, I suspect, you've got bigger problems than a dying drive. Your system itself may be corrupting data somewhere along the line.

My approach to rescuing this would be to remove both drives and install them into a scratch computer. Doesn't have to be a good computer, as long as it can boot a rescue CD of some sort. Then block-copy the old drive onto the new one, raw. This will overwrite all current contents, and it must be equal or greater size. Use dd_rescue if you have it, dd conv=noerror,sync if you don't.

If your drive has bad sectors, they'll stick out during this process, but that can't be helped. dd and dd_rescue will fill in bad blocks with pure zeroes when they can't be read. The resulting blind copy may be good enough to mount and recover data from.

If it didn't have any bad sectors, it probably means it was a good drive but being fed mangled data. Bad RAM perhaps, causing operating system malfunctions?

Only then, once your data isn't in danger of flopping over and dying the more you touch it, should you start playing around with it.

How's it supposed to tell "good" files from "bad" ones, by the way?

LMHmedchem · December 21, 2011, 12:24pm

That is more or less what I thought, but since the drive seemed to work alright until I transferred allot of data, I wasn't so sure. In the last effort, I did a low level format, and then copied about 1GB of data onto the drive. Then I restarted and check disk ran. It found some errors, fixed them, and then finished. On subsequent restarts, check disk didn't run, so I thought I was in the clear. I was able to open files and use apps in the data I had copied. Then I copied about 50GB more data and restarted. The same check disk cycle started, but this time it wouldn't finish. After restart, some of the original 1GB of data was corrupted and those apps would fail to run. There are many variables here, so the logical thing to do would be to try to insure that the fault was not in the data being moved. The problem is that the data is on a drive with bad sectors, but it works in the main. Check disk does not run on every start up when that drive is in the machine, which it should if the source file system is really borked.

This is a problem that is proving difficult to diagnose. There are two other platter drives and an SSD in this box and they are not acting up at all. This leads me to believe that the new drive is just bad. The fact that the drive passes WDs diagnostic software makes that a bit less clear. Memory, the motherboard sata controller, sata cables, power supply, operating system, etc, are all other places where the problem could reside. In most of those cases, I would expect the problem to be more wide spread. I moved the drive off of the motherboard sata controller and onto a brand new PCI sata card in case the controller was going.

Is this something that I could do in windows cygwin, or would a flavor of linux be better. I have Cent and Ubuntu on one computer here. I have some live linux CDs, but the computers I could use those on are number crunching servers that don't have space for hard drives. Another issue is that once I have moved data onto these drives, when I delete the partition, I can't create a new one with a quick format. After this blew up again last night, I deleted the partition on the drive. When I replaced it, windows couldn't format the new partition. The format failed. This happened before and I had to do a low level format to get it back. That takes about 6 hours, so it's not a trivial step.

Yea, I'm not sure. I know that I get OS messages about corrupt files from time to time. I guess you could try to open the file with the default app and that would trigger some exceptions if the file is bad. I guess you could try chown or chmod, I have got some error messages about this not working on files when they may be bad. Anything like that would take forever.

At this point, I am inclined to RMA the drive (I have an open ticket on it) and do the dd_rescue copy with the new drive. What do you think about that?

LMHmedchem

Corona688 · December 21, 2011, 1:04pm

'low level formatting' hasn't been possible anywhere but the factory for decades now. What did you actually do?

Checking what? The bad drive, or the new one?

Only you'd know whether your data's any good. If your application can't tell you, then nobody knows. Application errors can't corrupt a filesystem, though. That takes a hardware or kernel fault. (Checking dmesg may be illuminating.)

And if you're getting data corruption on good disks, something in that server must be malfunctioning, therefore any backups you make using that server are suspect. The longer you keep toying with the original disk in the original machine, the more likely it gets that something worse will happen to your data.

Does your system have lots of free memory? If yes, most of it's going to be used as disk cache. That makes pretty good odds that disk will be the first thing trashed by a bad spot in RAM, in a highly unpredictable way.

Which PCI sata card? It's easy to get a lemon.

That basically means doing it in Windows since Cygwin isn't an operating system. It might technically be possible in windows but there'd be lots of hoops do jump through and proprietary software nobody would know how to help you with.

centos or ubuntu should do.

I wouldn't reccomend using Microsoft Windows to manage partitions for any system except Microsoft Windows.

Again, what do you mean by "low level format"?

The form of backup I'm thinking of wouldn't need partitions on the destination disk at all. It'd just be a raw dump of data from one disk to another, sector by sector, which clones all partition layout in the process.

Um, dd_rescue first, then RMA You kind of need the drive to make a copy of it.

dd_rescue will also tell you whether you get read errors or not.

methyl · December 21, 2011, 4:42pm

ckhdisk.exe is a very basic Microsoft program. Unless you run it manually it is triggered by a crude mechanism which decides whether there were incomplete disc writes.

What Operating System did you use to format this disc? Can we assume that this new disc is formatted NTFS rather than basic FAT? If not, it will not be able to deal with large files.
How did you format the disc? Did you run chkdisk.exe on the new disc before using it?

I too am amazed that you have the equipment for a low-level disc format. You will have needed to enter all bad sectors manually.

Because you have posted on unix.com , we must assume that unix is involved somewhere in this process.
Does the source disc belong to the system on which you are trying to do the copy? If not, where did it come from? What is the format of the source disc and what Operating System and software wrote the files on the disc? What proof do you have that the source disc is corrupt? What did you type when trying to copy the files? What error messages do you get? How big is the largest file (especially if bigger than 2 Gb)?
A detailed hardware and software inventory would help. I wonder if you are fitting modern disc drives to an old computer?

LMHmedchem · December 26, 2011, 1:34pm

The format was done under windows XP 32-bit. This is a multi boot box, but this data drive is primarily used for windows data. It does have a second NTFS partition that I share with linux installations in other boot partitions. Check disk never ran on that partition.

Yes it was NTFS

Normally I create partitions using EASEUS partition master (v9.1). I believe that this does a quick format by default. After the drive started acting up, I reverted to creating and formatting partitions with windows disk manager. It is an adequate tool if all you need to do is to create or delete partitions. I tried both long an quick formats. One curious things is that after the drive started acting up, I deleted the partition in windows, but I could not quick format a new partition after creating one. I got an error the the format failed. If I did a long format, it would finish and the drive was usable. That suggests bad sectors that a quick format can't work around, but the WDLD tool doesn't find bad sectors, nor does HDtune free.

I did not run checkdisk on the drive before using it. Is that a standard practice? I guess it makes sense, I try to test most of my other components. I have run the WDLD tool on some of my new drives before but I can't remember if I did it this time.

A hardware forum suggested to me that a low level formats may fix issues, especially if there was a problem in the partition tables or MBR. It would also work around bad sectors if possible. I just used a software tool called HDD Low Level Format Tool (HDDGURU: Laptop and Desktop Hard Disk Drives, Tests, Software, Firmware, Tools, Data Recovery, HDD Repair). I don't know that this does much of anything different then the windows long format, but it does remove the MBR. I had to activate the drive in windows after running the tool. It also took like 10 hours to run.

I have cygwin installed and make significant use of it, so I do allot of things in bash. I copied the files from the bad source disk to the replacement drive using,

cp -Rfp sourceDir/ destinationDir/ >& sourceDir_copylog.txt

This has been my standard procedure for quite a while. It is much faster than using any windows tool and it doesn't quit if it suddenly runs into a file it can't copy. The redirected stderr and stdout gives a record of any files that couldn't be copied. This box also has ubuntu, cent, scientific, and suse installed, so those are available if there are some native linux techniques to try.

The source disk is a windows NTFS drive of the same make and model (WDCB 1TB 6gb/s). Partitions on the source disk were created and formatted using EASEUS.

I run rsync every night to backup the data drive to a backup drive in the same box. About a month ago, I noticed the rsync wasn't finishing and it seemed like the issue was with the destination drive. I ran the WDLD tool on it and it failed the short test. I ran the long test and it said there were bad sectors that it tired to fix. The tool failed trying to fix the bad sectors, so I RMAd the drive. I have an external backup of the same drive, so when the replacement arrived, I used cp to restore my internal backup from my external. The internal backup drive has worked well since.

Recently I noticed that there were some issues with the primary data drive, so I ran WDLD, found errors, and RMAd that drive as well. This was the same bad sectors error that I got on the backup drive.

This new problem arose when trying to copy data onto the replacement for the backup drive when it arrived. I did the same file copy with cp -Rfp as before.

I'm not sure how bit the largest file is. I have some linux iso files stored on this drive, and those are pushing 5GB. Those are probably the largest thing I have.

The hardware is as follows,
PSU: CORSAIR CMPSU-750TX 750W
MOBO: GA-EP45T-DS3R f3 BIOS
CPU: Q9550
RAM: 2x2GB DDR3 OCZ3RPR13334GK 1333MHz, 6-6-6-20, 1.75v
GPU: EVGA 896-P3-1257-AR GeForce GTX260 Core 216
SSD-OS1: OCZ VertexII 60GB, WinXP-32bit sp3
HDD-OS2: VelociRaptor 150GB, Ubuntu 10.10 64-bit, CentOS 5.1 64-bit, Suse 12.1 64-bit
HDD-Data: 1TB Western Digital Caviar Black
HDD-Backup: 1TB Western Digital Caviar Black (10GB pagefile partition)

As far as software, I'm not sure what is relevant, but I have cygwin installed for the gnu compilers, Zone Alarm ISS, java JRE, eclipse, MS office, Adobe CS, a bunch of chemistry and statistics tools, and various system tools.

I wouldn't call this an old computer by any means, but it is certainly not the most current hardware either. I found on another forum that there is an issue with using the western digital 6GB/s SATA III drives on a SATA II controller. These are all supposed to be back compatible, but apparently you need to add jumper to restrict the drive to 3GB/s. Others have reported bad sectors popping up over time without the jumper. It would have been nice for WD to advertise that a bit better. I am confident that is what was causing the bad sectors to pop up in the first place. But I put a jumper on the replacement drive that is acting up now and that didn't help.

At this point, I am able to use all of the other drives in this box, so I am inclined to think that the problem is not with the SATA controller. I have also run memtest 86+ and it didn't find anything wrong with the memory. I will run a long prime95 later today to see if my system is generally stable.

At this point, I have the data drive with the bad sectors unplugged and I'm waiting for an RMA of the replacement drive.

Corona688, this is the replacement that I did an RMA on, I still have the original bad sector drive that I'm trying to get the data off of. If I plug that drive it, it works and I can open files and such. I don't know what data on it is affected by the bad sectors, so I've left it un plugged for now. The rest of the computer seems to work fine. I can boot in to the OS, run apps, etc, and the other drives in the box aren't triggering checkdisk.

My plan is to load the bad sector hdd and the RMA replacement drive into a new computer I have and use ubuntu live linux to do dd_rescue to try to recover the data. Then I will boot windows in the new computer and see if the new drive is stable. If it is, I will put it into the suspect machine and see what happens. Hopefully the replacement drive I got was just bad and that is all there is too it.

Did I answer everyone's questions? Sorry for the delay in response, it has been a busy weekend.

LMHmedchem

LMHmedchem · December 27, 2011, 8:46pm

I have the new replacement drive in a new computer along with the old drive with the bad sectors. I have ubuntu 10 loaded from a flas stick and I installed ddrescue (the gnu version I think).

I'm not sure how to go about a device to device copy. I believe that the new drive (unformatted, no partitions) is sdc and the old drive with the data is sda. It would be nice to confirm this.

Can someone point me to a tutorial on how to do this or post a list of instructions?

The source disk with the bad sectors is NTFS with two partitions. It seems like it should be something like,

ddrescue -f -n /dev/hda /dev/hdc logfile

The example indicates this is for ext2 partitions, so I don't know if you need to do something else for NTFS.

LMHmedchem

Corona688 · December 27, 2011, 9:21pm

As already explained, dd_rescue doesn't care at all about filesystems, it makes a raw bit-for-bit copy. This is the entire point of using it, because it will not seize up and start foaming at the mouth when the filesystem isn't completely valid. You can copy filesystems from any OS. So if there's actual bad sectors onthe drive, you can use dd_rescue to copy it from the bad driveto the good then commence data recovery on the new drive.

ddrescue inputdevice outputdevice

Try fdisk -l to see which device is which.

LMHmedchem · December 27, 2011, 9:36pm

Alright thanks, I will try in a few minutes. I assume I want a log file.

What kinds of things should I do to check the new disk after I make the data dump?

LMHmedchem

Corona688 · December 27, 2011, 9:40pm

No point keeping one unless you have something to save it on. You can't save it on the old disk, you can't save it on the new disk, what exactly would you save it on?

Look at the value of errxfer during/after transfer. If it's ANYTHING but zero, there were bad sectors on the source disk during transfer.

Depends what it is and what's on it. My expertise is with UNIX systems, not Windows ones. chkdsk may be able to successfully run on the new drive if it couldn't on the old one. Of course, my advice would be to hook it to a system which can mount the parittion and then just recover your data before you get too carried away...

LMHmedchem · December 27, 2011, 11:11pm

I wrote the logfile to /home/ubuntu/desktop/logfile. Since I am running ubuntu off of a flash drive, I presume that is a writeable location, otherwise it is in memory, but the file does appear on the desktop.

So far it has recorded 24 errors. I guess it is unclear to me what the data on the new disk will look like when there were errors and how I would find files that are now unreadable, etc.

Once the ddrescue is finished, I can hook up an external drive and copy the data from inside ubuntu. Will it be ok to just use cp, or should I use some other method to move the data to the external drive? What will cp do if it runs into corrupt files.

Once I have made a backup of the new drive, I will boot windows on the new computer and see what it makes of the drive. If it is fine, or if checkdisk runs and completes, I will put the drive back into the original computer and see if there are still issues or if the problems have resolved.

LMHmedchem

LMHmedchem · December 28, 2011, 11:47am

The ddrescue run is finished, I have attached the log file if anyone is interested.

This is the summary,

Summary for /dev/sda -> /dev/sdc:
dd_rescue: (info): ipos: 976762560.0k, opos: 976762560.0k, xferd: 976762560.0k
                   errs:     24, errxfer:        12.0k, succxfer: 976762560.0k
             +curr.rate:     3064kB/s, avg.rate:    37374kB/s, avg.load: 10.9%

I'm going to copy the data from the new drive to an external backup drive now and see how that goes. Is there anything I can do to check the new drive to look for files that are incomplete or damaged?

LMHmedchem

LMHmedchem · December 28, 2011, 12:39pm

I'm running cp -Rfp (in Ubuntu) to copy the data from the new drive to an external backup drive.

There is one error,
cp: cannot access `Data_Old/Data_Applications/seamonkey2_profiles/yahoo/newstuf/Cache.Trash/Trash/Cache': Input/output error

I presume this means that the file in that path is corrupted. What can I do to get rid of that bad file entry? That is not a file that I need, if it was, I would go to my third backup and get a good copy of it.

LMHmedchem

Corona688 · December 28, 2011, 1:13pm

lmhmedchem:

The ddrescue run is finished, I have attached the log file if anyone is interested.

This is the summary,

Summary for /dev/sda -> /dev/sdc:
dd_rescue: (info): ipos: 976762560.0k, opos: 976762560.0k, xferd: 976762560.0k
   errs:     24, errxfer:        12.0k, succxfer: 976762560.0k
   +curr.rate:     3064kB/s, avg.rate:    37374kB/s, avg.load: 10.9%

Good. The drive had bad sectors and needed replacing, then -- and extremely few sectors were actually bad. That's fairly decent odds that next to none of them landed on anything important -- though that's just down to luck.

Well, copy them and see.

It's possible a block of zeroes might have landed right in the middle of a file of course. How to check for that depends on how you'd normally verify your data.

Also possible is the filesystem structure itself being disrupted.

---------- Post updated at 12:13 PM ---------- Previous update was at 12:10 PM ----------

No, it means the filesystem is corrupted. A bad file would just have chunks of zeroes inside it. There's no magic combination of zeroes and ones inside a file which causes it to be unable to be read.

Probably something in the filesystem metadata got filled with a chunk of zeroes when it couldn't be read. But now, instead of retrying and retrying for minutes at a time until it gives up on a bad sector, it's actually a good sector filled with nulls where the filesystem was expecting directory entries or inodes or something. So it gives up instantly.

Depending on what the partition is and how badly its damaged, fsck, chkdsk, or reformat. Same as any other partition.

LMHmedchem · December 28, 2011, 1:28pm

Thanks a bunch, this is the kind of think I hate to try to navigate through on my own. :wall:

Since the file system on my new drive has issues after the dd, it seems I should do something like the following.

Try a repair tool (fsck, chkdsk)
If that doesn't work, reformat the new drive and copy data from my external backup back on to the new drive.
(the data that made it onto my external drive should be fine since cp can't copy a file system error)

Since this is an ntfs partition, should I try chkdsk in windows first, or can I try fsck from Ubuntu, or does it matter?

LMHmedchem

Corona688 · December 28, 2011, 1:37pm

I think you misunderstand slightly.

It's possible for a file that copied fine to have a chunk full of nulls in the middle of it because of the errxfer. That won't turn it into a mysterious undeleteable file. It'll be an ordinary file that doesn't misbehave with cp, but its contents could be other than what you were expecting.

These undeleteable files had corruption in the filesystem itself.

Since there were only 12 kilobytes of bad sectors -- which were as likely to happen in empty space as inside a file unless the drive was over 50% full -- the odds of this happening are hopefully low.

For block copies the contents don't matter, but if you want to actually check the filesystem contents, use windows tools to check windows filesystems.

LMHmedchem · December 29, 2011, 2:40pm

Well I'm getting closer. After the cp to the external drive finished, there was only one IO error noted. I do get the fact that there could be other files with some 0s instead of whatever used to be there. I was just commenting that the IO error from cp was from a problem with the file system on the src drive and not the file that was being copied. Also that I shouldn't expect corruption of the filesystem on the external drive resulting from the cp.

I booted windows on the new computer and chkdsk ran and finished this time, so it looks like the new drive is good to go, plus I have the data on another external.

...then I put the new drive back into the original box and tried to boot windows. I got to the windows xp screen and the progress bar was slowly lurching across (which I have never seen) and it never did get past that point. I had used that computer several times that day (without the data drive), so that was odd. I decided to unplug all the drives except the ssd with windows. I forgot the grub2 was installed on one of the other hdd MBRs, so I got a device not found error from grub2.

I have not been able to get grub working yet. I tried to get the hdd with grub2 back into the sata port it was in, but I didn't have any luck with that. I ended up mounting all of my drive in the sata pci card and booting with a supergrub2 disk. I did, search for grub2 installations, and then loaded the file it found. I was able to boot into both windows and ubuntu doing this. I did update-grub in ubuntu, and it found all the OSs, but I still don't get the grub menu on boot without using the supergrub2 cd.

I'm not sure if I need to uninstall and re-install grub, or if grub has a problem when the drive is on a pci controller, or what.

I realize this has gone well beyond my original question about a cp script, so if anyone thinks I should move this conversation to a different forum, just let me know.

LMHmedchem

---------- Post updated at 02:40 PM ---------- Previous update was at 12:02 PM ----------

I was able to use boot-repair from a CD,
https://help.ubuntu.com/community/Boot-Repair

and this got my grub menu back by re-installing. I am not getting check disk running on restarts, so it looks like everything is resolved for now. I am running everything off the sata pci card, so I'm still not sure if my motherboard sata controller is dead or not.

LMHmedchem

Corona688 · December 31, 2011, 12:09pm

That there were actual bad sectors on the drive kind of points to the drive controller not being the defective part...

Windows may need to be reinstalled if data was corrupted. How else can you know you have a (relatively) dependable system?