Performance Hit With Many Files

cyner · March 3, 2009, 3:28am

Is it slower to open or create a file in a directory with 1 million files than a directory with 1000 files? How much slower? Where can I find information about this?

I'm mainly concerned about JFS on AIX, but also NTFS on Windows Server. Is there a difference?

I'm trying to determine a good way to store a large number of files (2 million and growing about 200 K per year). Currently, I have them in six different directories, so it's about 300 K files per directory.

Thanks!

otheus · March 3, 2009, 3:58am

Excellent question! It is indeed filesystem-dependent. Such performance metrics are hard to come by, since there are so many variables, and to do a good apples-apples comparison, you need the same computer with the same disks and install multiple OS's. But it may not be necessary, really, to get such statistics. One can look at the filesystem architecture, directory handling semantics, and conclude one might be better than the other.

Unforatunally, I cannot provide specifics on JFS nor NTFS. However, ReiserFS and modern versions of ext2 (on filesystems created with -O DIR_INDEX), file creation and lookup are very fast; they both use a hash index to find files. So if you know the name of the file, it can be found almost instantly (as I understand it and have experienced). On older versions of ext2, things really started to slow down after the directory entry itself extended to one or more indirect blocks -- maybe 1000 or so.

You do know one way to store large amounts of files is to create a directory hierarchy that is keyed on the filenames themselves? So files named "ergo1802.txt" might be stored in:

    data/er/go/18/ergo1802.txt

cyner · March 3, 2009, 4:40am

otheus:

You do know one way to store large amounts of files is to create a directory hierarchy that is keyed on the filenames themselves? So files named "ergo1802.txt" might be stored in:
   data/er/go/18/ergo1802.txt

Yes, that would probably be the smartest design if the file names are randomized or evenly distributed. I know the users directories at Sourceforge are organized like that (something like /home/u/us/username). Unfortunately, I've inherited a legacy system, and first need to determine if it's worth the trouble to change the design.

Maybe I should create a JFS partition on a virtual Linux machine and do some benchmarking. I don't have access to an idle AIX machine comparable to the one used in production. Of course, such a test would differ from "reality" in some ways: different OS, CPU architecture and storage solutions (local IDE drive compared to SAN).

cyner · March 4, 2009, 3:33am

I haven't had the time to do a test on Linux yet, but I just finished a test on my Windows XP desktop machine (NTFS). I'm not sure how valuable this test is, but it's very interesting... Please give your thoughts on this.

Opening and closing 100 selected files randomly 100,000 times from directories containing different amount of files (relative times):

100 files: 100.0
1000 files: 100.4
10,000 files: 101.3
100,000 files: 109.6
1,000,000 files: 130.9

A performance hit of 30% when going from 100 to 1,000,000 files in a directory!

When I ran the tests again, they were not only faster, but the differences were almost zero:

100 files: 100.0
1000 files: 100.0
10,000 files: 100.6
100,000 files: 100.2
1,000,000 files: 100.3

Obviously, some caching is going on. So, if you open the same files over and over (and the number of files is small enough), it doesn't seem to matter how many files you keep in the directories.

This caching could suggest that the performance hit above would be larger if I had opened more files than 100. Another way of doing this test would be to read every single file in random order.

Maybe I should have used the same 1,000,000 files in each test case and instead distributed them differently (100 files per directory, 1000 files per directory etc). But then other variables would affect the results, such as how I distributed them -- path depth, number of directories etc.

Details

I used a script to create files with random names of 10+3 characters. I copied the files from the "100 directory" to the other directories, then added additional files. The files were almost empty (72 bytes).

Then I ran a Python script that opened and closed randomly selected files (from the 100 files above) in each directory. The source code is:

import datetime
import random

def getMS():
    dt = datetime.datetime.now()
    ms = dt.microsecond / 1000
    ms += dt.second * 1000
    ms += dt.minute * 60000
    ms += dt.hour * 3600000
    return ms

fh = open("files.txt", "r")
filenames = map(lambda fn: fn.strip(), fh.readlines())
fh.close()

random.seed()

NUMBER_OF_OPENS = 100000
TIMES_PER_CASE = 3

testcases = ["1000000", "100000", "10000", "1000", "100"]

for i in range(TIMES_PER_CASE):
    for testcase in testcases:
        starttime = getMS()
        for j in range(NUMBER_OF_OPENS):
            filename = "c:\\temp\\test" + testcase + "\\" + random.choice(filenames)
            open(filename, "rb").close()
        endtime = getMS()

        print testcase, i, endtime - starttime

And the results:

C:\Temp>python -OO openfiles.py
1000000 0 16156
100000 0 13531
10000 0 12508
1000 0 12399
100 0 12346
1000000 1 12291
100000 1 12274
10000 1 11886
1000 1 11265
100 1 11117
1000000 2 11199
100000 2 11183
10000 2 11232
1000 2 11166
100 2 11166

Machine Specifications

I ran the tests on my old desktop DELL Optiplex 280 with a Pentium 4 CPU (2.8 GHz), 2 GB DDR2 SDRAM and 80 GB Serial ATA-150, 7200 rpm hard drive (cache size unknown).

I'm using Windows XP SP3 with NTFS. I shut down all anti-virus, indexing and updating services and most programs before running the tests.

The hard drive was defragmented after creating the small files and before running the tests. I also rebooted before running the tests.

vbe · March 5, 2009, 1:28pm

We did once testing of the sort (with VMS, Novell unix MS...) and found out that all true preemptive multi-process-multitask OS where outperformed by the others...
Its the price you pay for equally sharing your time between all the processes...

(For me it proved again that windows server (NT4 W2000) were still not completely preemptive multitask...)

MarkSeger · March 6, 2009, 8:59am

I too have done similar testing but slightly different. Rather than simply time the creation of 100 files, which can be very misleading OR timing the creation of 1M files which is no better, I prefer to look at what is happening across the system resources during the entire event.

I run collectl with a monitoring interval of 1 second, logging to a file or simply watching the system in real time. When I create a million files I can watch the cpu periodically increase. In fact, when getting in the higher ends of files I can actually see spike in cpu load. This is something you can't see when just doing end-to-end numbers.

Another interesting test is to set up an alarm in your script to write out the number of files created every 10th (or even hundredth) of a second. You'll be amazed to see how linearly the number of files created/second drops over time as well as how things periodically slow down but are not visible when only looking at second-level samples.

You can also run collectl at a monitoring interval of 0.1 seconds and see micro-spikes in CPU load as well. This is something most people miss because none of the existing tools can deal with sub-second reporting.

-mark

vbe · March 6, 2009, 11:15am

We were not talking of 100 files but files by 10'000's...

Doesnt that remind you about CPU scheduling priority in time?