SAN vs. Local disk.

ikx · February 15, 2019, 9:50am

I am in the market looking to purchase a new E950 server and I am trying to decide between using local SSD drives or SSD based SAN. The application that will be running on this server is read-intensive so I am looking for the most optimal configuration to support this application. There are no other servers or applications that will use this SAN (if I decide to go that route.) The deciding factor to me is Performance regardless of the hardware cost (granted I don't want to pay for something I end up not using.) SAN or local SSD (both running RAID 10)? This is really the question I am trying to answer before I pull the trigger and complete this purchase. Any insights from this community is greatly appreciated.

If it helps, here is the configuration I am currently looking at for a server with Local Storage. NVME disks will be used as boot devices. No Plans to run VIOS or create another LPAR. just one instance supporting one application.

1 9040-MR9 IBM E950 Power 9 Rack Mount Server, includes:
4 EPWR IBM 8 Core 3.6/3.8 P9 Processor (32 Core Total)
32 EPWV 1 Core Processor Activations (32 total)
4 EB3M Power Supply - 2000W
4 EM03 Memory Riser Card
32 EM6B 16GB DDR4 Memory Dimms (512GB total)
2 EC5J 800GB NVMe Drives (Boot/OS Drives)
7 ESHU 1.86TB SFF-3 SSD Drives (RAID 10 with one spare)
2 EJ0L PCIe RAID Quad Port Adapter with 12GB cache, 6GPS per port.

rbatte1 · February 25, 2019, 6:26am

It all comes down to the homonyms cash & cache.

How much is you budget for cache? That's the key really. SSD is slightly slower than cache.

For write operations, you would have to balance off the time to commit the update to real disk (even if it is SSD) between the two. if you pass the update to a SAN, it will respond very quickly to say that you have written it, but it will actually write the data in its own time. The update is cached for write but you can continue. There will be cache batteries for power loss before it's really written. For local disk, it depends. Does the RAID controller have a good cache allocation and would therefore behave in the same way? If not, you (the operating system) must ensure that the write is complete before you proceed (costing CPU Sys time I think) and that can confusingly make local IO slower.

You have, of course, stated that this is a read intensive server so your other thing to consider is cache/RAM in the server. The server will fill up with the data you read in normally anyway, but if you wish, you could pre-read the data to give it a head-start. Beware that you need to have lots of memory for this else you will just drop it again. You can just do a find for the files of data you want and cat them to /dev/null so that they get read. Is 512Gb sufficient for your data? You don't say how much you have.

I hoe that my thoughts help,
Robin

bakunin · February 25, 2019, 9:27am

There are a few more points to consider IMHO:

1) disks (regardless which technology) will over time malfunction and need to be replaced. There is some effort involved in such a replacement. SAN systems have usually ways to - more or less - "effortless" replacing disks built in because they usually are used to deal with a lot of disks and the chance that one disk malfunctions rises with the number of disks involved. You might want to do some risk calculation based on how often on average you expect a disk to break (there is usually a "MTBF" - "meantime between failure" number available), how long you expect the change to take and how much a downtime of the expected duration will cost.

2) SANs and local disks do differ in the way they are attached to the system. Local disks may use SCSI or the "M.2" interface. Notice that you cannot have several local (SSD-) disks attached via M.2. SAN, onthe other hand, may use a FC-connection or even several FC-connections in parallel (FC-drivers allow this for redundancy as well as load-balancing). The capacity of such FC-connections may by far exceed the speed of local disks. On the other hand you will need not only a SAN but also a FC-switch (Brocade are the most wide-spread) and the administration ("zoning") will be more complicated.

3) SANs - if set up with this in mind - may add redundancy and thus high-availability to the system. Again, it depends on the system, its purpose, etc.. to calculate properly the risk of it failing for some amount of time. Calculate how much it would set you back to have the system offline for: 1 hour / 1day / 1 week and this will give you an idea how much money spent on preventing these kinds of desasters is worthwhile.

4) SAN systems for themselves are rather expensive. To alleviate this they become cheaper and cheaper (in comparison to local disks) the more you virtualise and the more systems use it. So, a plan to buy a SAN system for a single system only might be on the expensive side but with a expectation of adding other systems using it too the costs may still be reasonable. So you may want to rethink your immediate problem in a more gobal context.

5) Notice that for a system optimised for speed you need an adequate backup solution. For this you also need a disaster scenario plan to estimate what the costs could be and hence how much the prevention may cost. Then you know how fast a recovery needs to be and therefore which technologies you need to employ to get that speed. You can use a SAN also for snapshots (perhaps quickest way of recovering a "point in time"), very fast medium to put an online backup to, only then migrating it to slower media like tapes and similar solutions. You might want to take that also into consideration.

I hope this helps.

bakunin