SAS or SSD for Ubuntu 14.04 and data analysis

I am in the process of building a workstation and have a question related to performance. I am a scientist who deals with big data (average file size 30-50gb). My OS is ubuntu 14.04 and so far I have a 128gb dual xeon E5-2630 with 6 cores each. I/O buffering is an issue so I am adding a 256/512? PCIe card and either 2 SSD or SAS drives for the OS and software. Since the PCIe will be separate its main purpose will be for file transfer, so would a SAS or SSD be a better fit for the OS? I am leaning towards SAS for the buffering issue, but wanted to ask more knowledgeable users. I forgot to mention that there will be a separate 1 or 2TB drive. Any recommendations for the size of the SAS or SSD? Thanks :).

do you access your data files randomly or sequentially?

I access files sequentially. Thank you :).

SSD is more than an order of magnitude (or much) faster than SAS high-rpm disks.
SSD is limited - usually to 1-2 TB of storage. With 128GB of memory, you could easily use SSD disks to load whatever file you want into memory - e.g., usual term is a RAMDISK. Ubuntu supports this. It also caches files very effectively without much human intervention other than configuration.

Learn about pdflush: The Linux Page Cache and pdflush

There is also vmtouch. You can force any file to be read entirely into memory. Which would definitely favor SSD.

Hoytech Also note some other tools on that site.

So, I would suggest: SSD's and vmtouch (or an analagous tool.)

1 Like

You might also consider m sata ssd
Samsung SSD 840 EVO mSATA | Samsung SSD

1 Like

Not directly related but i had a longer workshop yesterday about our new storage system (EMC VMax 200k). EMC claims that they had intended the 300GB 15k-SAS drives for high-performance, but phase them out now because (quoting from memory) with the development of Flash-SSDs its just not worth it any more. They also claim that, because they use SLC-based hardware, they have even lower rates of disk-replacement, even in heavy-duty transactional storage systems, than with rotational disks, to which a much lower energy consumption of the SSDs compared to the 15k-SAS disks contributes. There is simply less heat involved and that shows when you pack some ~2500 disks into a rack.

You haven't said where you are going to place the workstation, but in case it is going to be somewhere near your desk: 15k-disks are awefully LOUD in addition to be premier heating devices while SSDs are completely silent.

I hope this helps.

bakunin

2 Likes

Thank you all :).