Understanding read/write and kernel interaction

Krothos · September 16, 2009, 10:24pm

Ok, so I'm trying to finalize my understanding of read/write and kernel interaction.

read():

You have a library function that has as it's first parameter (what the open file to read from is), second parameter( a pointer to a buffer (is this the location of a buffer in the user area or the controller buffer in the kernel?) and a third parameter(# of bytes to read).

So, once the controller stores words from disk into its own buffer, the DMA then transfers the data to the main memory and the CPU gains control. My question is, what now? The purpose was to read a certain amount of words. So, are those words being returned to the caller once they are in the main memory or is an address returned of where in the main memory those words are located?

write():

First param(which file to write to), second param (again, unsure. Is this pointing to a buffer in the user area already filled in with the new words to modify the file with?), third parameter (how much you replace)

Main question entails how this even works as opposed to the read. Does the controller still need to store a certain amount of words from the disk into its own internal buffer?

I understood it like this: Controller's own hardware buffer somehow stores the words to modify the file. The DMA then uses the new block within the controller buffer to modify a file within the main memory.

Any help is appreciated

jlliagre · September 16, 2009, 11:57pm

and here a system call

a file descriptor

no kernel buffer but userland (virtual memory)

read isn't necessarily retrieving data from a disk through a controller. What is done depends on what the file descriptor points to. Might be a file, a raw device, the network, a serial port, a virtual device, ...

These aren't words but bytes.

If by main memory you mean kernel memory, no. A process has no access to the kernel memory.
If you are asking if data is copied twice, then that depends on the kernel implementation. Generally, it is true but some OSes allow direct I/O from the driver to the user supplied buffer.
See for example Solaris directio function: directio(3C) provide advice to file system (man pages section 3: Basic Library Functions) - Sun Microsystems

Yes (bytes)

Same as read.

I don't get what you mean here.

Krothos · September 17, 2009, 12:13am

So once the bytes are transferred stored over in the main memory through the DMA, how is what was read returned to the caller? So if I wanted to read 3 bytes from a file containing "Hello", how is "Hel" returned to the library function? What sees this and how/where is it returned from?

[/quote]

What I'm talking about what is needed for the write.

What I meant was this:

Your library function is passing a reference to a buffer storing the bytes to replace a certain block in a file (for example). So, the goal is just to replace those bytes in the file with what is in the buffer (the second argument passed in).

What I am confused about is HOW exactly the modification takes place. How those bytes within the user area buffer replace the target file block of bytes.

Are the bytes within the user area buffer somehow transferred over to the buffer which resides on the device controller for let's say, a disk (the file to modify is on the disk)? And then from there, what's in the buffer of the controller somehow replaces the targeted block within the file you are trying to modify?

So you're passing in bytes to modify some block within the target file. I don't get how this takes place for a write.

That was what I was saying. I don't get the process of how write takes place in terms of a file let's say.

jlliagre · September 17, 2009, 2:04am

That's the whole purpose of a system call. Passing data from/to the kernel.

That's the picture but the process is quite more complex than your description.
There are several layers crossed by the data between your application and the disk blocks.
As you are talking about a file, the filesystem, file cache and possibly a log are playing a role. Writes are usually delayed so you will need to wait for a flush for the data to be committed on disk. Also, some form of software or hardware raid (mirroring/striping and the likes), checksums or compression might take place, the disk itself will certainly have a cache too.

Krothos · September 17, 2009, 2:42am

Yes but the DMA then puts what was read into the main memory which then becomes stored into the buffer which was originally passed in with the read. Got it.

I understand that but I'm trying to look at it from a somewhat high level. The way I see it now, for a write, you have a pointer to buffer of what to write. You then write those bytes to the kernel buffer (one some controller buffer) from what was in the user buffer. What is in the controller buffer overwrites some bytes in the file. I know, as you said, there are many mitigating items to think about but is there a basic way you can explain how what is in the controller buffer (the bytes taken from the user buffer) overwrites bytes in a target file?

Thanks

Btw, I've read a ton of material. For whatever reason, people don't like digging further into how these things work. The only decent material which started to explain the write() in detail was an OS book which is out of print and unfortunately google books doesn't have the pages for that section.

jlliagre · September 17, 2009, 4:40am

A file is an abstract object. What is written is one ore more blocks/sectors. The middle layers are responsible to make that happen properly.

jim_mcnamara · September 17, 2009, 7:33am

FWIW - directio in Solaris enables direct access that restrictions.
It is restricted to I/O data sizes by disk geometry (sector size) for example.

A memory mapped (mmap() ) file is as close as you can get to direct I/O for a file on most UNIX systems using system calls like read/write.

NOTE: a succesful write call does NOT guarantee that the data will be physically written to a file completely or correctly. See: sync fdatasync or google for 'synchronized I/O data integrity completion'

The reason for this comment is that the OP seems to assume the opposite i.e.,
successful write == successful data synchrony

achenle · September 17, 2009, 8:38am

FWIW, Linux is a lot more restrictive on what it allows for direct IO than Solaris does. At least when comparing Solaris 10 and OpenSolaris to RHEL 5.2/5.3. AFAICT, at those release levels Solaris places no restrictions on the number of bytes transferred, but Linux requires full disk blocks and only full disk blocks.

Earlier versions of Solaris did require a page-aligned buffer, but that's no longer needed. I don't think Linux ever had a buffer alignment restriction for direct IO.

Also, to use direct IO on Linux, you need to open the file with the O_DIRECT flag.

I'd characterize mmap()'d IO as more "transparent" to an application than I'd characterize it as "direct". As far as I'm aware, memory-mapped IO pretty much always uses kernel caching of data just like buffered IO does, and that's pretty independent of your flavor of Unix.