Filesystem Benchmarks for HDDs and SSDs

Hi,

I'm interested in storage benchmarks for various configurations in order to figure out what's best for a virtualization environment. The virtualization environment will be proxmox, as it is my choice for the best manageable virtualization platform with plenty of features right now.

I want to look at the following configuration options, which may have an impact on performance:

  • filesystem
  • lvm
  • thin provisioning
  • transparent compression
  • multi disk technology(technology, raidlevel)
  • ssd caching

thin provisioning

Thin provisioning is the method of having virtually unlimited space and provide actual physical existent space only in the amount of actually used space. So you can define multiple TB of disk capacity and only have a 250 GB SSD at the back. If that backend device is getting filled up, you can add more storage when you need it. It's especially helpful in the times of SSDs because they are still considerably more expensive, so you do not want to spend thousands of $ when you in fact do not need it. Furthermore there are big differences in SSD products. SSDs for desktop use maybe quite cheap. But SSDs for server which are heavily written on are much more expensive.

price example

  • normal consumer SSD: 500 GB m.2 ssd start from 80 € (Total Lifetime Write Capacity: 300 TB = 600 Full Writes)
  • datacenter SSD: 375GB Intel Optane SSD DC P4800X PCIe costs about 1200 €. (Total Lifetime Write Capacity: 20.5 PB = 57,000 Full writes)

filesystem and lvm

Many filesystems have interesting features, which are helpful besides the pure performance and problems which one would not like:

  • PRO: zfs and btrfs has checksums and selfhealing against data corruption.
  • PRO: zfs and lvm provides methods for thin provisioning
  • PRO: ext4 is easy to use. a simple fire and forget filesystem.
  • PRO: btrfs has an enormous flexibility
  • PRO: lvm has the flexibility to change configurations without downtime
  • CON: ext3 has quite long filesystemcheck times.
  • ...

transparent compression

Transparent compression is a layer which reduces the amount of written/read data onto/from the raw disk and thus may increase speed at the cost of cpu power.

multi disk technology(technology, raidlevel)

There are different multi disk technologies available. Linux Software RAID, LVM, btrfs raid, zfs raid. They combine the speed of multiple devices and add redundancy to be able to cope with device failures without data loss.

ssd caching

ssd caching can accelerate slower hdds by adding putting used data onto the fast ssd as read cache or by storing datas to be written preliminary to the ssd and have it synced to the slower hard disks in the background, not loosing data security, because data written to the ssd is already persistent.

ceph - no option here

Ceph is a very interesting technology. I'm not considering using it, because the money needed to get it run with good performance is a lot higher than just with disks and ssds. You need at least 10 G networking, or even better, which is a lot more costly than 1 G. You need full equipped SSD Storage which is more expensive too. A big plus with ceph is that you get a redundant network storage, so you can immediately start virtual machines on other nodes if a compute node crashes. If money is no problem, and the performance is not needed at the maximum, ceph would be an excellent choice. I have a 3-node-cluster with ceph here up and running. It works like charm. Administration is easy and performance is fine.

In the following threads, I'll introduce more on my environment and scripts of the benchmarking.

3 Likes

Thanks a TON stomp for sharing this, please keep it up :b:

Thanks,
R. Singh

My test hardware is the following:


inxi -v2 -C -D -M -R

System:    Host: pvetest Kernel: 5.3.10-1-pve x86_64 bits: 64 Console: tty 1 Distro: Debian GNU/Linux 10 (buster) 
Machine:   Type: Desktop Mobo: Intel model: DQ67SW v: AAG12527-309 serial: BQSW133004FE BIOS: Intel 
           v: SWQ6710H.86A.0067.2014.0313.1347 date: 03/13/2014 
CPU:       Topology: Quad Core model: Intel Core i7-2600 bits: 64 type: MT MCP L2 cache: 8192 KiB 
           Speed: 1687 MHz min/max: 1600/3800 MHz Core speeds (MHz): 1: 2690 2: 3287 3: 3659 4: 3682 5: 1887 6: 3648 7: 3658 
           8: 2228 
Network:   Device-1: Intel 82579LM Gigabit Network driver: e1000e 
Drives:    Local Storage: total: 3.97 TiB used: 12.73 GiB (0.3%) 
           ID-1: /dev/sda model: N/A size: 930.99 GiB 
           ID-2: /dev/sdb model: 1 size: 930.99 GiB 
           ID-3: /dev/sdc model: 2 size: 930.99 GiB 
           ID-4: /dev/sdd model: 3 size: 930.99 GiB 
           ID-5: /dev/sde vendor: Intel model: SSDSC2MH120A2 size: 111.79 GiB 
           ID-6: /dev/sdf vendor: Samsung model: SSD 850 EVO M.2 250GB size: 232.89 GiB 
RAID:      Hardware-1: Intel SATA Controller [RAID mode] driver: ahci 
           Hardware-2: Adaptec AAC-RAID driver: aacraid 

The hard disks are of type SAS and attached to the adaptec raid controller as single disks. One Intel SSD as OS-Filesystem. The other one is attached PCIe SSD-m.2 Adapter. An additional m.2 SSD will be attached for later tests with ssd caching.

For the tests I will make use of fio - flexible I/O tester - one of the currently most popular storage benchmarking tools.

My production scenario will be webhosting. So it will be 25% write and 75% read. I will test that probably later after the basic read/write tests.

At first I'm making sure the device names I use are fixed so my tests will not overwrite any of the wrong disks. This may happen under linux because there is no fixed device naming of storage devices. The ordering may be different at every reboot. And it actually is, as I have noticed.

So I'm checking the serial numbers and copy the device file names to unique names I will be using then.

What's regarding partitions: I try to avoid using them and use whole disks instead as it makes the procedere simpler.

The git repository for the scripts is here:

GitHub - megabert/storage-benchmarks: Storage Benchmark Scripts

The script for creating the device names is this:

storage-benchmarks/mk_dev_names at master . megabert/storage-benchmarks . GitHub

So I reckon that RAID3 will be slightly better than RAID5 (unless you're going to use RAID10 with a large number of members).

1 Like

Would love to see zfs test on that ratio.
Should shine with separated l2arc devices on ssd, when it gets warm.

Be sure to limit ARC size in production scenarios, leaving <insert size> for large application allocations if required.
If you intend to benchmark zfs as well.

For KVM and ZFS inside VM(s), more tuning will be required... would not recommend inside virtual machines with additional layers on top of raw(s), qcow(s) or zvol(s)
Containers on the other hand work directly, so it should be interesting to see performance on LXC with zpool configured with L2ARC and log devices.

AFAIK transparent compression with snapshots/clones etc. outside brtfs and zfs will be hard to find on linux filesystems.

So it's XFS or EXT4 all the way i'm afraid with LVM inside hypervisors for flexibility.
Stripe it over those rust disks and explore LVM caching a bit (have not used it, but it's there :slight_smile: )
You will have everything but transparent compression @ your disposal.

Regards
Peasant.

3 Likes

I got some advices from a person who wrote his thesis on the subject of benchmarking:

  1. benchmark with the applications that are like the later used applications.
  2. test with concurrent i/o-requests(if that's your scenario, and it is like that almost always).
  3. test with small block sizes, as this will be the realistic work load for the storage in my case.
  4. test with virtual machines and with network, so it will be like the i/o when the system is used in production.

I'm already testing with small block sizes. Concurrent jobs testing is running at the moment. Real-world testing will be done at some later point.

--- Post updated at 04:53 PM ---

  1. Performance Base Line of the system

This benchmark is to demonstrate the actual speed of the used system. It's in no way relevant for the later workload and just to make sure there storage system is generally performing without major trouble.

Interesting: RAID-5 has slower write speeds than expected. zfs-RAIDZ which is similar in it's data distributin is considerably faster.

Single-Disk, Sequential Read, Single Threaded, Test with 1M Block-Size. Bandwidth in KB/s

singledisk.zfs.seq_read.bs_1M.compressed.run_1.json                   129088
singledisk.zfs.seq_read.bs_1M.run_1.json                              129365
singledisk.ext3.seq_read.bs_1M.run_1.json                             140170
singledisk.ext3.seq_read.bs_1M.lvm.run_1.json                         140197
singledisk.ext4.seq_read.bs_1M.lvm.run_1.json                         148060
singledisk.ext4.seq_read.bs_1M.run_1.json                             148154
singledisk.btrfs.seq_read.bs_1M.lvm.run_1.json                        151971
singledisk.btrfs.seq_read.bs_1M.run_1.json                            154073

Single-Disk, Sequential Write, Single Threaded, Test with 1M Block-Size. Bandwidth in KB/s

singledisk.btrfs.seq_write.bs_1M.run_1.json                           112133
singledisk.btrfs.seq_write.bs_1M.lvm.run_1.json                       115764
singledisk.zfs.seq_write.bs_1M.compressed.run_1.json                  130645
singledisk.zfs.seq_write.bs_1M.run_1.json                             132107
singledisk.ext3.seq_write.bs_1M.lvm.run_1.json                        132526
singledisk.ext3.seq_write.bs_1M.run_1.json                            132902
singledisk.ext4.seq_write.bs_1M.lvm.run_1.json                        146049
singledisk.ext4.seq_write.bs_1M.run_1.json                            146220

Single-Disk, Sequential Read, Single Threaded, Test with 4K Block-Size. IOPS

singledisk.zfs.seq_read.bs_4k.run_1.json                               33963
singledisk.zfs.seq_read.bs_4k.compressed.run_1.json                    34625
singledisk.ext3.seq_read.bs_4k.run_1.json                              36023
singledisk.ext3.seq_read.bs_4k.lvm.run_1.json                          36028
singledisk.ext4.seq_read.bs_4k.lvm.run_1.json                          37882
singledisk.ext4.seq_read.bs_4k.run_1.json                              38013
singledisk.btrfs.seq_read.bs_4k.lvm.run_1.json                         38643
singledisk.btrfs.seq_read.bs_4k.run_1.json                             38647

Single-Disk, Sequential Write, Single Threaded, Test with 4K Block-Size. IOPS

singledisk.btrfs.seq_write.bs_4k.run_1.json                              890
singledisk.btrfs.seq_write.bs_4k.lvm.run_1.json                          895
singledisk.zfs.seq_write.bs_4k.run_1.json                               3026
singledisk.zfs.seq_write.bs_4k.compressed.run_1.json                    3189
singledisk.ext3.seq_write.bs_4k.run_1.json                              3465
singledisk.ext3.seq_write.bs_4k.lvm.run_1.json                          3476
singledisk.ext4.seq_write.bs_4k.lvm.run_1.json                          6972
singledisk.ext4.seq_write.bs_4k.run_1.json                              7237

Multi-Disk, Sequential Read, Single Threaded, Test with 1M Block-Size. Bandwidth in KB/s

raid10.zfs.seq_read.bs_1M.compressed.run_1.json                       267917
raid10.zfs.seq_read.bs_1M.raid10.run_1.json                           287920
raid5.ext3.seq_read.bs_1M.raid5.run_1.json                            291303
raid5.ext4.seq_read.bs_1M.raid5.run_1.json                            294709
raid10btr_native.btrfs.seq_read.bs_1M.raid10btr_run_1.json            298280
raid10.btrfs.seq_read.bs_1M.raid10.run_1.json                         310551
raid10.btrfs.seq_read.bs_1M.lvm.run_1.json                            317500
raid10.ext3.seq_read.bs_1M.raid10.run_1.json                          319551
raid5.btrfs.seq_read.bs_1M.raid5.run_1.json                           319678
raid10.ext3.seq_read.bs_1M.lvm.run_1.json                             327557
raidz.zfs.seq_read.bs_1M.run_1.json                                   329564
raid5.ext4.seq_read.bs_1M.lvm.run_1.json                              330541
raid5.ext3.seq_read.bs_1M.lvm.run_1.json                              335769
raidz.zfs.seq_read.bs_1M.compressed.run_1.json                        350659
raid10.ext4.seq_read.bs_1M.lvm.run_1.json                             354560
raid10.ext4.seq_read.bs_1M.raid10.run_1.json                          355618
raid5.btrfs.seq_read.bs_1M.lvm.run_1.json                             383868

Multi-Disk, Sequential Write, Single Threaded, Test with 1M Block-Size. Bandwidth in KB/s

raid5.btrfs.seq_write.bs_1M.lvm.run_1.json                             63088
raid5.btrfs.seq_write.bs_1M.raid5.run_1.json                           63716
raid5.ext4.seq_write.bs_1M.lvm.run_1.json                              68732
raid5.ext4.seq_write.bs_1M.raid5.run_1.json                            70874
raid5.ext3.seq_write.bs_1M.lvm.run_1.json                              73033
raid5.ext3.seq_write.bs_1M.raid5.run_1.json                            73329
raid10.zfs.seq_write.bs_1M.compressed.run_1.json                      197587
raid10.zfs.seq_write.bs_1M.raid10.run_1.json                          202224
raid10.btrfs.seq_write.bs_1M.lvm.run_1.json                           211278
raid10.btrfs.seq_write.bs_1M.raid10.run_1.json                        217614
raid10.ext3.seq_write.bs_1M.lvm.run_1.json                            221382
raid10.ext3.seq_write.bs_1M.raid10.run_1.json                         225781
raid10btr_native.btrfs.seq_write.bs_1M.raid10btr_run_1.json           244794
raid10.ext4.seq_write.bs_1M.raid10.run_1.json                         254144
raid10.ext4.seq_write.bs_1M.lvm.run_1.json                            254237
raidz.zfs.seq_write.bs_1M.compressed.run_1.json                       258104
raidz.zfs.seq_write.bs_1M.run_1.json                                  276822

Multi-Disk, Sequential Read, Single Threaded, Test with 4K Block-Size. IOPS

raid10.ext4.seq_read.bs_4k.raid10.run_1.json                           11616
raid10.btrfs.seq_read.bs_4k.raid10.run_1.json                          28261
raid10.ext4.seq_read.bs_4k.lvm.run_1.json                              32650
raid10.ext3.seq_read.bs_4k.raid10.run_1.json                           34177
raid10.zfs.seq_read.bs_4k.compressed.run_1.json                        58822
raid10.zfs.seq_read.bs_4k.raid10.run_1.json                            64036
raid5.ext3.seq_read.bs_4k.raid5.run_1.json                             70986
raid5.ext4.seq_read.bs_4k.raid5.run_1.json                             72429
raid10btr_native.btrfs.seq_read.bs_4k.raid10btr_run_1.json             74755
raid5.btrfs.seq_read.bs_4k.raid5.run_1.json                            76245
raid10.btrfs.seq_read.bs_4k.lvm.run_1.json                             77020
raid10.ext3.seq_read.bs_4k.lvm.run_1.json                              82203
raid5.ext3.seq_read.bs_4k.lvm.run_1.json                               83717
raidz.zfs.seq_read.bs_4k.run_1.json                                    85503
raidz.zfs.seq_read.bs_4k.compressed.run_1.json                         87909
raid5.ext4.seq_read.bs_4k.lvm.run_1.json                               88718
raid5.btrfs.seq_read.bs_4k.lvm.run_1.json                              96451

Multi-Disk, Sequential Write, Single Threaded, Test with 4K Block-Size. IOPS

raid5.btrfs.seq_write.bs_4k.lvm.run_1.json                               371
raid5.btrfs.seq_write.bs_4k.raid5.run_1.json                             374
raid10.btrfs.seq_write.bs_4k.raid10.run_1.json                          1238
raid10.btrfs.seq_write.bs_4k.lvm.run_1.json                             1255
raid5.ext3.seq_write.bs_4k.lvm.run_1.json                               1714
raid5.ext3.seq_write.bs_4k.raid5.run_1.json                             1762
raid10btr_native.btrfs.seq_write.bs_4k.raid10btr_run_1.json             1846
raid5.ext4.seq_write.bs_4k.lvm.run_1.json                               2313
raid5.ext4.seq_write.bs_4k.raid5.run_1.json                             2408
raid10.zfs.seq_write.bs_4k.raid10.run_1.json                            3256
raidz.zfs.seq_write.bs_4k.compressed.run_1.json                         3381
raidz.zfs.seq_write.bs_4k.run_1.json                                    3469
raid10.zfs.seq_write.bs_4k.compressed.run_1.json                        3638
raid10.ext3.seq_write.bs_4k.lvm.run_1.json                              4165
raid10.ext3.seq_write.bs_4k.raid10.run_1.json                           4373
raid10.ext4.seq_write.bs_4k.lvm.run_1.json                              5020
raid10.ext4.seq_write.bs_4k.raid10.run_1.json                           5442

--- Post updated at 04:56 PM ---

Why do you think that? I do not understand that RAID3 would perform better. As I understand it, it should be nearly the same. Only the parity goes to one dedicated disk and that disk is used heavily.

EDIT: Ahh. I possibly understand. RAID3 uses Byte-Level-Striping instead of Block-Level_Striping of RAID4/5 which may be calculated faster? Unfortunately Linux Software RAID does not support RAID3.

--- Post updated at 05:25 PM ---

  1. First Insight: LVM seems not to impact read/write throughput or iops performance

    Check the numbers by comparing the neighbor rows with and without lvm with the same other specs. The numbers of those pairs do only differ very little. I'll still test and watch the lvm performance readings, but I'll not report them any more, except there is some worth mentioning.

    Single-Disk, Random Read, Single Threaded, Test with 4K Block-Size. IOPS

    singledisk.btrfs.random_read.bs_4k.run_3.json                            144
    singledisk.btrfs.random_read.bs_4k.lvm.run_3.json                        146
    
    singledisk.ext3.random_read.bs_4k.run_3.json                             150
    singledisk.ext3.random_read.bs_4k.lvm.run_3.json                         149
    
    singledisk.ext4.random_read.bs_4k.run_3.json                             147
    singledisk.ext4.random_read.bs_4k.lvm.run_3.json                         150
    
Single-Disk, Random Write, Single Threaded, Test with 4K Block-Size. IOPS
    singledisk.btrfs.random_write.bs_4k.run_3.json                           192
    singledisk.btrfs.random_write.bs_4k.lvm.run_3.json                       190

    singledisk.ext3.random_write.bs_4k.run_3.json                            290
    singledisk.ext3.random_write.bs_4k.lvm.run_3.json                        290
    
    singledisk.ext4.random_write.bs_4k.lvm.run_3.json                        304
    singledisk.ext4.random_write.bs_4k.run_3.json                            305
    
Multi-Disk, Random Read, Single Threaded, Test with 4K Block-Size. IOPS
    raid10.ext4.random_read.bs_4k.lvm.run_1.json                              66
    raid10.ext4.random_read.bs_4k.raid10.run_1.json                           66
    
    raid10.ext3.random_read.bs_4k.raid10.run_1.json                          149
    raid10.ext3.random_read.bs_4k.lvm.run_1.json                             150
    
    raid10.btrfs.random_read.bs_4k.raid10.run_1.json                         153
    raid10.btrfs.random_read.bs_4k.lvm.run_1.json                            151
    
    raid5.ext4.random_read.bs_4k.raid5.run_1.json                            154
    raid5.ext4.random_read.bs_4k.lvm.run_1.json                              153
    
    raid5.ext3.random_read.bs_4k.raid5.run_1.json                            155
    raid5.ext3.random_read.bs_4k.lvm.run_1.json                              156
    
    raid5.btrfs.random_read.bs_4k.raid5.run_1.json                           158
    raid5.btrfs.random_read.bs_4k.lvm.run_1.json                             159
    
Multi-Disk, Random Write, Single Threaded, Test with 4K Block-Size. IOPS
    raid5.btrfs.random_write.bs_4k.raid5.run_1.json                           64
    raid5.btrfs.random_write.bs_4k.lvm.run_1.json                             65
    
    raid5.ext4.random_write.bs_4k.lvm.run_1.json                              67
    raid5.ext4.random_write.bs_4k.raid5.run_1.json                            68
    
    raid5.ext3.random_write.bs_4k.raid5.run_1.json                            78
    raid5.ext3.random_write.bs_4k.lvm.run_1.json                              78
    
    raid10.btrfs.random_write.bs_4k.lvm.run_1.json                           311
    raid10.btrfs.random_write.bs_4k.raid10.run_1.json                        313
    
    raid10.ext4.random_write.bs_4k.raid10.run_1.json                         465
    raid10.ext4.random_write.bs_4k.lvm.run_1.json                            471
    
    raid10.ext3.random_write.bs_4k.raid10.run_1.json                         570
    raid10.ext3.random_write.bs_4k.lvm.run_1.json                            559
    
Multi-Disk, Random Read, Single Threaded, Test with 4K Block-Size. Bandwidth in KB/s
    raid10.btrfs.random_read.bs_4k.lvm.run_1.json                            604
    raid10.btrfs.random_read.bs_4k.raid10.run_1.json                         613

    raid10.ext3.random_read.bs_4k.lvm.run_1.json                             600
    raid10.ext3.random_read.bs_4k.raid10.run_1.json                          594

    raid10.ext4.random_read.bs_4k.lvm.run_1.json                             265
    raid10.ext4.random_read.bs_4k.raid10.run_1.json                          265

    raid5.btrfs.random_read.bs_4k.lvm.run_1.json                             636
    raid5.btrfs.random_read.bs_4k.raid5.run_1.json                           630

    raid5.ext3.random_read.bs_4k.lvm.run_1.json                              622
    raid5.ext3.random_read.bs_4k.raid5.run_1.json                            620

    raid5.ext4.random_read.bs_4k.lvm.run_1.json                              610
    raid5.ext4.random_read.bs_4k.raid5.run_1.json                            617
    
Multi-Disk, Random Write, Single Threaded, Test with 4K Block-Size. Bandwidth in KB/s
    raid10.btrfs.random_write.bs_4k.lvm.run_1.json                          1245
    raid10.btrfs.random_write.bs_4k.raid10.run_1.json                       1250

    raid10.ext3.random_write.bs_4k.lvm.run_1.json                           2234
    raid10.ext3.random_write.bs_4k.raid10.run_1.json                        2279

    raid10.ext4.random_write.bs_4k.lvm.run_1.json                           1884
    raid10.ext4.random_write.bs_4k.raid10.run_1.json                        1858

    raid5.btrfs.random_write.bs_4k.lvm.run_1.json                            258
    raid5.btrfs.random_write.bs_4k.raid5.run_1.json                          257

    raid5.ext3.random_write.bs_4k.lvm.run_1.json                             312
    raid5.ext3.random_write.bs_4k.raid5.run_1.json                           312

    raid5.ext4.random_write.bs_4k.lvm.run_1.json                             268
    raid5.ext4.random_write.bs_4k.raid5.run_1.json                           270
    
2 Likes