Building Linux cluster for mechanical engineering software

Hello everybody,

I'm new here in the forum and first i will greet everybody.

Also I'm new with the issue of HPC, but I have to inform my urgently.

My issue:
I'm a mechanical engineer, specialised on simulation like fluid dynamics (CFD) and FEM. Especially I'm programming software for this case.

Everyone who knows CFD, knows already which resources such a simulation needs. In the past I worked on a workstation (Dual-Core Xeon E5), but now I have to build a cluster, especially which can manage the jobs in different queues.

My requirements:

  • It should be a linux cluster
  • I want to build a cluster with some (for the beginning two) dual-core xeon servers, more machines will follow soon.
  • For the special software tools (Ansys CFD) a clone of rhel 6.3 (scientific linux 6.5 is already working) is necesarry.
  • For the user GUIs (meshing for CFD or monitor the simulation) a X-Server, which is available by rdp or/and NX, is needed
  • For the software I build on my own, I will take experience with GPGPU- and MPI - Programming (not so important, this issue can wait)

I think that are the important values for the Cluster. For that issue I had searched a long time, but I don't found many information about hpc-cluster building and I don't found good literature therefor.

Because of that my questions:

  • Are there a good HowTo to build those cluster ?
  • Which (special) hardware do I need ?
  • Which software do I need for all that (user administration, parallel filesystem, batch system to manage the jobs, cluster-monitoring, MPI, ...) ?
  • Can I use only open-source software ?
  • Should I use a VM with another linux as base?

I know these are many questions, but I don't find another way and the time is running out.

I'm very happy about every helpful answer. I want to thank you in advance!!

Greets

Just a few thoughts:

  • Don't overlook Lustre, the high bandwidth distributed NFS.
  • VM is a run in the opposite direction, but for some things it can be appropriate. Watch your reliability aand administrative models, as more VMs is just that much more load on them.
  • Clusters are usually homogenous. There are other tactics for distrbuted processing that are more heterogenous-friendly.
  • A remote X like vnc often has much higher performance, due to low Xserver latency.

Thank you for you answer.

I don't know Lustre before, I will take a look on it.
To your second thought: I will avoid to run a VM, but If there is a no other way, than I would.
I don't understand correctly what you mean with homogenous and heterogenous ? Which tactics are also available ?

If I want a remote access with vnc, I need a X Window System ?! Or I'm misguided ?

But there are a lot of other questions I asked in my first post ? Is there anybody else, who can help me ?

Some of your questions are so vague that it is hard to make any informed suggestions. How would you respond if you got a request from someone to tell them how to choose the best vehicle? (Who is going to be driving it? How many passengers do you need to carry? How much weight do you need to be able to tow? How much secured cargo space do you need? What are the weather conditions where it will be driven? What type of terrain does it need to traverse? ...)

I know very little about about ME and nothing about Ansys CFD. Are you trying to build a cluster to support hundreds of users submitting thousands of jobs? Are you trying to build a cluster than can break a single huge job into thousands of threads and run all of those threads simultaneously? Do you have any experience writing thread-safe code?

Can you use only open-source software? Of course you can! You can write all of the code you need and make it available for everyone to use as they see fit.

Does open-source software already exist for all of the code you want to run? How can we guess at that from what you've told us? We have no idea what all of the code you want to run needs to do.

If you don't know the difference between a heterogeneous cluster and a homogeneous cluster, you probably don't have the background needed to design the cluster you want. Please consider hiring an architect with experience setting up and running an HPC data center who you can sit down with and discuss budget, capabilities, computing projects to be run, users to be supported, software to be run, software to be written, etc., etc., etc. Setting up an HPC data center is a very complex, expensive undertaking.

This answer I've been waiting, but please do not get me wrong, I'm thankful for that.

Of course I'm a beginner in this topic, because of that I try to carry Information about it.
I'm aware that there is no general statement to give, when I asked "how to build a cluster".
It don't have to be a cluster for thousands of people with just more than 500 nodes, just for institut (15 - 20 user). But before it can be built, used .... Someone have to inform. And this is my part. I'm searching for sources to inform, unfortunately there are less sources, respectively I don't find them.

My intention is to begin at point zero of the hpc-topic and then to make gradually steps to the wright direction. For that I have hardware to test to build a "little" cluster to get first experience.

The goal of the entire project is to decide, whether to built the needed cluster on my own. As always, the point is to save costs.

I think if you embark on something that has the potential to become quite large, you should not overreach. Start with something that works for you, even if not all the details are implemented yet. More than likely you will find that you do not need everything you listed.
You will get very far with Scientific Linux as you already mentioned. Have you also looked into Rocks Cluster (www.rocksclusters.org | Rocks Website), which has a lot of what you will need right out of the box.

VNC on XWindows platforms creates a local virtual XWindow desktop that supports your choice of window managers, has low latency and can be viewed by a phelora of platform supporting viewers off your client machine of choice. I am using a JAVA viewer, as I lack local admin. The X tcp or unix sockets run inside the host for min laatency (unless you point off-host X clients to it), and a VNC socket connects the viewer. You can run the VNC tcp though an ssh tunnel for security.

Heterogenous clustering requires smarter load balancing and code compatability or porting. Java is portable, compared to C++/g++, which produces code specific to the CPU and O/S, but still is very widely available to compile locally compatible code.

VM makes sense. In practice, very few modern systems page much, and it makes the environment that much more robust. It can support huge sparse matrixes in an mmap()'d space, key to many problems.

Going highly parallel on cpu and ram suggest that net and file access will become bottlenecks, so yes, you need to put lots of work into making them as parallel as possible, too. Network fabric needs to be many path switches and high bandwidth. If you go fiber with either, remember that with its higher speeds comes higher latency, so problems may need to be structured to avoid that. Net and file have been becoming the same problem, as more and more file is remote from the host.