optimizing disk performance

J.P · June 12, 2002, 7:15pm

I have some questions regarding disk perfomance, and what I can do to make it just a little (or much :)) more faster.

From what I've heard the first partitions will be faster than the later ones because tracks at the outer edges of a hard drive platter simply moves faster. But I've also read in a book that most-used partitions should be placed in the middle of a range of partitions. This doesn't make much sense to me.
Does this "put-heavily-accessed-in-the-middle" thing apply to logical partitions?

Should the heavily accessed partition really go in the middle? Even if the first partitions are large in size and later are small, thus leaving the heavily accessed partition "nearer" the center (where the platter moves slower)... ?

Any other suggestions like partition layout, filesystem recommendation etc are highly appreciated

I'm working with a linux system using a server software that does a lot of reading and writing so every bit of speed I can gain is valueable.

Thanks in advance

Perderabo · June 12, 2002, 8:44pm

I have to tell you that I don't really think this approach is a great idea. With unix filesystems, it's too hard to to keep a file precisely positioned in one spot. Unix wasn't meant to be used that way. But here are your answers...

In the olden days disks had a fixed geometry. The first track and the last track held the same amount of data. As disk manufacturers chased after greater data densities, they changed things so that the outer tracks now have more sectors than the inner tracks. Some disk optimization papers were written in those olden days. Every thing they say may no longer apply. This is the problem with exploiting disk geometry...it changes and suddenly your hack is now counterproductive.

But in the olden days, since each sector could be read with equal speed, your primary concern was getting the disk heads to your sector. This is why putting the data in the middle of the disk is a good idea. The heads cannot be more than half a disk away. So the mean seek time is as low as you can get it. But this assumes that the heads might be anywhere on the disk. If you can guarantee that the heads are positioned over your data, seek time becomes less of an issue. One way to do this is to use only the outer tracks of each disk drive and ignore the inner 90% of the disk.

If that's not possible, then it will depend on how much data is to be transferred. With large multi-sector transfers, the longer seek time may be compensated by the faster transfer time. The only way to be sure is to try it both ways and benchmark it.

And while you're at it, put the data on the inner tracks and benchmark that. That should be the worst case, longest mean seek time and longest transfer time. This will give you a feel for how little benefit you're reaping from a lot of work.

J.P · June 13, 2002, 11:45am

thank you very much for that informative reply.
Yes I've realized it's too much work for so little

But I'll do some benchmarking anyway just to test (just downloaded IOZone). Any recommendation on other good benchmarking software ?

And yet another hard disk tip i got from another book. It claims I'll get slightly better performance if I split my linux installation on several disks since more disk heads are working on the operating system. True, or just another worthless statement?

Thanks again
/J.P

Perderabo · June 13, 2002, 12:15pm

That is absolutely true. At some point you have so many drives on an i/o path that they must wait for each other. Until you reach that point, the more drives the better.

I'm not a linux expert. But on HP-UX, there are several ways to exploit more drives. One way is to just use raids. They are a collection of drives that look like one drive to the os. But also with individual drives, HP (under LVM) supports striping. This lets you distribute a logical disk over several physical disks. HP also supports disk mirroring. With mirrors, a write must go to 2 or more drives, so writes take longer. But a read can come from any drive so reads are quicker. Since you usually read much more often than you write, this is a performance win.

If you don't have stuff like this on linux, then you can still work at the application level. It's not enough to just have the drives, they must be involved with your i/o intensive work. You may be able to one file on one drive, another file on the next drive and so on.

izy100 · March 1, 2005, 11:32am

I will recommend these 2 books:
1) configuration and capacity planning by Brian Wong
2) Sun performance and Tuning (java and internet) by Adrian Cockcroft

If fighting against time, go to this:
http://www.emc.com/techlib, go to:
Storage Systems
Guidelines for deploying EMC networked storage systems. Topics include configuration guidelines for optimizing reliability, scalability, capacity, and availability in various environments.

Choose
"EMC CLARiiON Fibre Channel Storage Fundamentals".

If still no time to read, but got lots of budget, do RAID 10 (Oracle says SAME, Stripe and Mirror Everywhere/Everything)
If this can't help, most likely no IO Storage Subsystem can help much. You will need application tuning.

Remember the 80/20 rule: 80% of the time, ONLY 20% of the data is still used.