I/O bound computing clusters

figaro · August 25, 2012, 7:39pm

I want to build a computing cluster and have been looking into grid solutions. My understanding from grid solutions is that participating nodes have to actually sign up to participate in a computation and that an isolated piece of work is sent to the node through a request from that node (pull). Along that reasoning, would a solution whereby a controlling machine sends work to whichever node is available not be a grid solution (push)?

The problem we will be solving is data-intensive, so we will be looking at an I/O bound problem. What methodology is used whereby the data sits on one machine and the nodes use that data? Could a partitioning of the database work, whereby a node only works on the data in the partition and no other?

jim_mcnamara · August 26, 2012, 10:17pm

Is it a database or a LUN? Either way you can get to the data, but you do have a wonderful chance of becoming i/o bound.

There are ways around this: creating logical partitions either as tables or LUNs, each on separate physical LUN/tablspace file/devices.

Can you give us more information?

figaro · August 27, 2012, 3:12am

Thank you for your response.
The application runs on a database which is multiple 100GB big and resides on one node only for now. The nodes are either on a local area network or wide area network and are idle the vast majority of the time. Almost all nodes are multi-core machines and currently do not have a database. In other words, the nodes could be put to good use if they had easy access to the data (for instance by having its own partition or LUN) or data is sent across via http (which is likely to be detrimental to performance, as I/O becomes the bottleneck).
Putting the logical partitions on each of the nodes seems like a fruitful route, given that the results are no more than a few GB in size. At the same time, it also seems fairly rigid, because if one of the nodes is down, the system needs to be aware that the missing results will need to be recalculated somewhere else.

jim_mcnamara · August 27, 2012, 10:50am

If you have results, in some static form, you can dynamically mount NFS connections as needed to acces those directories.

Is your "backbone" 1GB or 10GB? If you can create a subnet for the fast NICs and each UNIX box has a 10GB NIC, this is very acceptable - what we do now. We create a job's data, notify the other box, it NFS mounts the dataset readonly, and away we go.

There is another issue to consider. Even though you may get great throughput, some boxes have issues. Solaris with older Qlogix cards takes a hit on interrupts. Because the cpu does a lot of work for the NIC.

All of this is a case of limiting factors, something you see in Science a lot. When you raise the bar on one limit (cpu) then some other resource becomes limiting (I/O in this case or interrupt stack). Since it is not economically feasible to build highways to completely handle rush hour traffic, so it is with computers. As long as it does not hurt production, and you get more processing power, you are okay. You did pay for the hadware, so use it.

figaro · August 27, 2012, 4:39pm

Thank you again for your response.
Our backbone is 1GB as far as I know, but would have to check. The bigger issue is with nodes on the WAN, we should be lucky to sustain 1MB on those lines. That means we should consider compression/decompression for the results.
I will have our administrator look into setting up NFSes on the available nodes.
Our application stack is fairly standard: FreeBSD 8.x with a C++/mysql/python application. This should eliminate the diversity problem, but that doesn't mean we will not run into hardware issues. We may even have to assign the jobs greedily to the node with the fastest CPU first.