File alignment and performances... (difficult)

Gnaag · June 17, 2009, 2:46pm

Hello !

I will use my best english possible to explain my objective. I'm french so pardon for the lack of precision...

So, what i would like to do in shell script (but you will possibly answer ''not possible in script'' have to use low level langage or something like that) is described below. All the blue is already done and working.

from a list of files of the same size, sorted by their name. Their name are adjacent numbers so they are like : 0001, 0002, 0003 ect...
copy them to one big file by concatenating each of them, one by one to one big file (doing that with dd and big block size because of fast I/O system and big file size)
cut the bigfile into pieces to recreate original files...

Before explaining why the "split" command do not match for me let's precise the context of my objective...

I'm trying to raise disk I/O performance on some group of files by putting them near to each others physically on the hard drive. Those files are big files (about 10MB each) that have to be read in a sequence order (like 1 then 2 then 3 ...) and hard drive head movement when file1 is far from file2 cost a LOT of performance.
As it is not possible to change the physical address of a file on a storage device, objective is to ''bluff'' the OS filesystem : copying a lot of files into one big (thus filesystem will try to write one big file in adjacent sectors) file.
I don't want to grow this post too big but if you want more details i will give some with pleasure.

So, i don't want split command because it's copying from one source file to multi destination. As i said before, generating new files will allow filesystem to spread them all over the drive, and i loose performance again...

Would some other command could help ? Is it possible to cut one big file into piece by only generating new entries in inode table to be as fast as possible ?
Is there some other solution than script thinkable ?

Thanks a lot for your help and your ideas !
Have a good day !

-----Post Update-----

perhaps should i have posted this to filesystem & disk section ?

jim_mcnamara · June 17, 2009, 3:08pm

Yes. You are describing partitioned database tables. Each partition has some common key - like a date or a filenumber. Whatever you choose. And you can then sprinkle the data across many disks and effectively 'parallelize' I/O - thereby having dozens of I/O requests being worked on at the same time, instead of sequentially from a single I/O request queue.

Oracle (or sybase or db2 or mssql) can access data in those kinds of datasets much faster than you will probably be able to emulate with your method. Even Microsoft got in the act with MSSQL

Partitioned Tables and Indexes in SQL Server 2005

fpmurphy · June 18, 2009, 9:41am

What file system type are you using? ext3? ext4?

Gnaag · June 18, 2009, 12:10pm

i'm using CVFS (Quantum Stornext SAN shared FS).
But it's running like any other fs in my context :

someone asked some write
i look in my free inode table
i look for size of the file
i put the file in free inode
...