write() issue during a low level hdd access

Hi,

I am trying to write zeroes to the hdd using a c program. I don't want to use the dd or ddrescue or any such inbuilt program because of reasons like real time progress, writing custom patterns. (my program is more like an erasure application, but does only zero fill).

here are the steps which i have followed. (I am executing under root).

Linux version used is ubuntu with (Intrepid) 8.10, with 2.6.27-7-generic.

opened the device /dev/sda using open call.

fd = open("/dev/sda", O_WRONLY);

and called a write call with

write(fd,buff,512);

The write call never fails for some strange reasons. It always returns 512 even in the case of a bad hdd connected to system. (The dmesg shows either the hdd is bad or not responding the write requests, or has I/O errors, or has sectors errors, or has failed the hard / soft reset operations).

I have tried using errno to get the error value, but nothing returns the error condition. Is that something i am missing?

Following is the code for the write function.

int writesector(const uint64_t offset, const int fd, const uint8_t *mybuff, const uint16_t len)
{
 if(!fd)
  {
    return 4;
  }
  errno = 0;
  if(lseek64(fd, offset, SEEK_SET) != -1)
  {
    if (errno!=0)
    {
      return 1;
    }
    
    errno = 0;
    
    if( (write(fd, mybuff, len)) != len)
    {
      return 1;
    }
    
    if (errno!=0)
    {
      return 1;
    }
  }
  else
  {
    return 2;
  }
  return 0;
}

Has anyone tried a similar approach in writing to hdd's at low level?

What else I can check for? I am currently out of options to identify the issue. I have tried implementing a write timeout, but it bombs badly! :frowning:

Any inputs / help on this issue please?

You're not doing low-level I/O here, your write is probably going straight into cache. Try opening with O_DIRECT.

Please edit your post to put code in code tags like the thread posting page suggests. [ code ] stuff [ /code ] without the spaces in the tags. Preferably before a mod needs to do so for you. :wink:

Surely you could write custom patterns with either dd or dd_rescue though! dd_rescue in particular shows lots of progress info. Just feed them custom patterns on stdin and you'll have your custom patterns.

thanks for the reply.

I know O_DIRECT might work, and I tried with O_DIRECT, but it fails for some reason. I have tried with aligned the write buffer after getting the block size from the HDD. But it fails miserably and I don't know how to debug it. I stopped digging into it because it might branch my work. :mad:

gdb is not of much help here. I've tried jumping directly to a known bad sector location in the code for testing., but I could not get it to fail the write call. (btw, dmesg starts spewing errors after a certain write attempts on the bad sector). So, the libata is catching those errors and kernel is aware of it. But the program does not receive the error.

I have tried mapping raw devices for the hdds and executed the same operations without success.

I wrote a small program which is working currently. Doing the testing.

I have been trying to get this to work for a long time with multiple failed options! Having sleepless nights too. :frowning:

Try printing the error message with perror() to find out what "some reason" is. As a stubborn programmer would say, don't give up until you know why it's not working. :wink:

I wanted the write to fail only for the bad sectors, but it was failing for all the sectors in the O_DIRECT mode, but I got a working program. (Well, working means, I was able to write to the sectors, but still could not get it to fail for the bad sectors).

As i had updated in my previous post, i had created a small program which writes to the sectors with O_DIRECT. I guess i might have made some mistake in my original O_DIRECT try. (lets leave it apart! :))

But, even with the O_DIRECT and O_SYNC flags in place, the write passes for all the bad sectors.

Is there anything that has be enabled / checked?

Thank you for the replies.

Hmm. Try fsync(fd) after the write. If it returns anything other than zero an error happened while syncing the data to disk.

Here is the code which I am using.

int writesector(const uint64_t offset, const int fd, const uint8_t *mybuff, const uint16_t len)
{
	if(!fd)
  {
    return 4;
  }
  errno = 0;
  if(lseek64(fd, offset, SEEK_SET) != -1)
  {
    if (errno!=0)
    {
      return 1;
    }
    
    errno = 0;
    
    if( (write(fd, mybuff, len)) != len)
    {
      return 1;
    }
    
    if (errno!=0)
    {
      return 1;
    }
    
    if(fsync(fd)!=0)
    {
      return 1;
    }
    //sync();
  }
  else
  {
    return 2;
  }

  return 0;
}

The buffer I am using is aligned. PATTERN_LEN len is currently set to 8192 bytes. (i.e., 16 sectors).

The code is:

      diskdata = memalign(512,PATTERN_LEN);

    	if(diskdata == NULL)
    	{
    		printf("FATAL ERROR: Unable to allocate memory for write buffer!\n");
    		exit (0);
    	}

      memset(diskdata,0,PATTERN_LEN);

The file is opened using:

handle = open64(devicename, O_WRONLY | O_DIRECT | O_SYNC);

I don't know what causes the write call to pass always! :frowning:

And yes, I am using the

#define _LARGEFILE64_SOURCE
#define _GNU_SOURCE

flags in the code for compilation of O_DIRECT and *64() functions.
Thank you.

Well, you're not doing raw I/O. Really only the kernel can do truly raw I/O, and worrying about write errors is the kernel's job.

If writes don't fail, what about reads? Try reading the data back.

read passes too. It does not return error until the device itself fails.

I just want to catch / count the read / write I/O errors from the program.

With the read atleast i can identify by comparing the data read from the sector to a known pattern of data and if does not match, I can flag the read has failed (round about way!).

yeah. it is not the direct raw i/o, but this is for portability and should work across most of the operating systems.

Unfortunately, AFAIK, the type of raw i/o that you need to meet your requirements is not available at the user level without writing your own device driver. And device drivers by their very nature are not very portable.

thank you.

So, there is no way around it without writing a device driver to handle the read/write operations? Is that right?

I have a utility written in python which able to identify the bad sectors exactly as the same way as the C program. I am not sure, if python would have implemented anything internally (a device driver) to achieve this. I'm currently downloading the python source code to analyse.

I am sure C is more low level then python (which is btw the dumbest statement, i've said :)) and should be able to achieve it. It's little weird that the program fails to identify the bad sectors. I have analysed the following programs so far for such an implementation. And all have the same code.

Testdisk, dd, ddrescue, badblocks etc.,.

So does this conclude that the none of the above available usermode linux opensource programs are really doing what they are claiming for? (data recovery / forensics utilities?) All the above program implements O_DIRECT options too.

Well, I guess I am in need of a fix now! :frowning:

It also sounds a little weird that there are no user mode programs (not even one?) available in linux that can do a direct I/O with the disk. (Except that I am willing to write one using the libata / scsi libraries which can directly talk to the ATA (PATA / SATA) devices using ATA protocol (i've done this in dos using assembly) and SG / SCSI protocol to the scsi devices).

Anyways, thanks for all your input and guidance. Atleast it got me to do the O_DIRECT implementation. Please update this thread if there are any more information to be shared / assisted.

Thanks again for all the help. :b:

The crux is that the kernel doesn't do reads when or how you tell it to. It might bundle it with other reads, feed you data from cache, or make you wait until other reads are done. It may have to do a little or a lot of translation between the device and you. This is true for most multitasking operating systems.

Have you tried doing reads?

it may also be using linux's ancient, soon-to-be-removed obsolete and unportable raw system.

---------- Post updated at 05:21 PM ---------- Previous update was at 05:19 PM ----------

Most of data forensics is reading. Writing to an iffy disk is a very silly thing to do.

Yes. I have tried the read call. It does not return a failure for the read operation on bad sectors.

It returns success. It says it had read the amount of data I request for.

The only way I can identify the error is when I try to compare / analyse the read data with the known pattern.

Thank you.

Another problem is that a failed read can take forever and a day. Most drive access is asynchronous, meaning the drive returns data whenever it feels like it, and can grind for minutes when it hits a sector it doesn't like.

Try getting rid of the cache for that disk with the posix_fadvise function. It might just be returning cache, even with O_DIRECT.

okey. will try posix_fadvise() now.

You could also see if the position of the file descriptor matches what it ought to be by checking with lseek64.

Just attempting to read a disk sector is completely different than attempting to write a disk sector following by attempting to read the same disk sector. That is why disk vendors have what are typically called low level utilities for bad disk sector scanning and the like.

yep. you are right. For (IDE) PATA / SATA (t13.org) disks there are the ATA / ATAPI standards which specifies how to query the device directly and SCSI has a loads of standards (t10.org) for interfacing with the drives. The ATA part can be easily coded in DOS / FreeDOS with Assembly / C counterparts.

Win32 support is available too for direct querying / interfacing of the devices (with DDK/SDK apis) and device ioctl calls. There are multiple utilities from various HDD vendors which either falls in either of the above two categories (DOS or WINDOWS).

I would like to know if Linux has anything like that? If I call libata calls directly in my program would it allow me to talk to the drive directly? At least for the (IDE) PATA / SATA drives?

Though Linux is programmer friendly, I feel not much control is given to the programmer (well, it also relates to the security feature! :slight_smile: and unlike windows, where there are many security exploits / overflows when the kernel address space is hacked from user space leading to privilege escalations, and loads of other issues, not to mention BSOD! :)).

I don't know if I am going into a vicious circle of kernel hacking / Linux abuse mode, but I just don't feel some things are right with Linux! :frowning:

I certainly mean no offense with my above words and for the great minds here. The above words are purely because of my frustration in trying to get the program to work partially at least if not completely.

Now for update:

I have tried posix_fadvise, but the read / write calls are locking up on the bad sectors as said by you.

I have removed the O_DIRECT and using O_WRONLY mode with posix_fadvise on the handle with POSIX_FADV_DONTNEED flag for the entire device starting from sector 0 to (last sector * 512). Is that the right option to be used?

Is there any timeout option that can be set on the Read / Write operations to say to the program / kernel to move on to the next sector / block based on the timeout (time consumed for the current sector)?

I tried with the FDSET , but i believe it is only for socket descriptors and not for file handles. (So dumb of myself! :frowning: )

Thanks for the replies.

you don't "call" libata. It's a device driver.

Au contraire. You can dump ISO images direct from CDROM drives. You can write to memory raw, and talk to raw I/O ports. In windows, this takes a device driver. In Linux, all it needs is device permissions. It's just more generally recognized in UNIX circles that this is a horrible idea in general, while old-fashioned Windows programmers are still reeling from the shock and betrayal of having to give up real mode at gunpoint ;p

Tell me, what does raw I/O even mean with a hard drive? Are you going to play with the DMA controller, set up interrupts, and send asynchronous requests yourself? Your idea of "raw I/O" hasn't much to do with what the drive is actually doing. Try the linux kernel mailing list if you're interested in truly raw I/O.

Congratulations, it's working. It's not the program that's locking up. The drive itself is trying to read the sector, failing, and taking many minutes of retrying before it gives up and informs the computer it can't.

The timeout is in the drive hardware itself. If it's configurable at all it might be one of the many things hdparm can do. Which incidentally might be something interesting to look at the source of for talking to drives on a low level.

Wow! It's all I can say!

That was a fantastic reply straight to temple! :slight_smile:

Yup! I read about the Libata stuff implementation and got to know that I cannot implement it in a program as you have pointed out!

You are right. Just that my requirement is to have a raw I/O with the drive, and I don't see a point in kernel to have it as it will not be used by anybody.

And well said about the windows programmers. rofl!! :smiley: (seriously no offense windows programmers!) (me neither! :rolleyes:)

hdparm only works for IDE/SATA drives and on certain systems, it fails to send commands to the HDD. If the drive is in really bad state, i.e., SMART failure is detected and / or if the BIOS restricts certain features based on SMART values (On certain IBM / HP / DELL Systems), hdparm is unable to send commands. And sdparm does not have certain key options for scsi drives as well (such as setting / clearing low level parameters).

Also, according to the ata specifications, timeout would be only for certain operations and ranges in the order of nanaseconds (400ns is default i believe).
And might not work in this case as it has to pass through the kernel / driver layers.

In one of the earlier posts, you pointed out about the option of having a driver for talking to the drives. Is there any generic direction / pointers you can point to or I should start at libata / scsi drivers?

I am extremely grateful for all the help / advice.