Class hpc project

My high school started a tech lab where students like myself can take apart computers, build circuit boards, learn to program and lots more.
I got the job of building a cluster with 4 old work stations we have. This is just a trial if it works well we can get more work stations.

We have one monitor per workstation and from what I have seen so far they have one DVI output and a Intel core duo 2 cpu each.
They will only be able to communicate with each other in a private network the school will not let use connect our project to the network for Internet access unless this trial goes well.
I have built computers before. I have worked with windows server a bit.
I know Php, Java, Javascript, CSS, and HTML. So I have a pretty good background in computers.

We want to do something that will interested people to come in and take a look.

I have never done something like this before is there a good tutorial series somewhere I was thinking about using Ubuntu server and what is a good program to run.

I wanted to find a prime number search program and some else's idea was a physics simulation but where do I find programs like that to run.

To get hands-on advice you really will have to fill out the template required for classwork/homework. But since you seem to need to know some basics about high-performance computing i might as well give you some starters which will help you refine your internet research.

Not that this will help you any in this project. Sorry, but this is a completely different world.

hmm, that is a hard one. I have written software for the simulation of electron-movement in crystal grids but this looks flashy only in movies. In reality this produces an endless stream of data which look about as interesting as telephone book of New York City if you do not know anybody there.

I have never done something like this before is there a good tutorial series somewhere I was thinking about using Ubuntu server and what is a good program to run.

Prime numbers (respectively number factorisation) is indeed a good example for parallel applications. Again: if you look for something flashy, look elsewhere. The run of such a program is about as interesting to look at as watching paint to dry.

OK, here is a short introduction to supercomputing:

Let us first talk about its motivation: suppose you need to lift a log. In India elephants are used for this purpose. Alas, there is a problem with that: the bigger the log is the bigger an elephant you need to lift it - and they come only up to a certain size. Now, there is an alternative: ants! A single ant cannot lift all that much compared to an elephant, but their sheer numbers make up for that. Ants need to be trained to pull together into one direction, whereas the elephant will always go wholly into the same direction naturally, and the coordination of the ants takes somewhat away from their strength so that 10 ants will not be 10 times as strong as a single one, but: once you manage to coordinate 1 million of them you could as well coordinate 10 or 100 millions of them. They might be only 5 (50) times as powerful as the 1 million but there are no elephants even 5 or 50 times as big as the known ones.

Processor design has two fundamental limitations: first, the speed of light. This limits the size of a chip because the electrons have to be able to go from one side to the other between two clock cycles. The speed of electrons is very high (in fact it is the speed of light), but not infinite.

The second limitation is the clock rate. The raw thermal output of a CPU increases with the square of its clock rate. Double the clock rate and you quadruple the heat produced. Because Intel successfully made everybody believe that clock rate has something to do with processing power (which isn't the case at all, btw., but "4GHz" is an easier message than "Floating Point operations, Integer performance, queue length, ...") every processor manufacturer has revved up the clock speed into the GHz scale and we now need heat-pipes, fans and what-not to cool these clock-speed monsters.

For these reasons certain processing powers cannot be reached by single processors (the elephants) but by coordinating many (maybe even smaller) processors (the ants) to work together on a single problem (the log).

This is done in several ways: Cray computers (more correctly: vector processors) were one design principle to achieve this. Basically it works like this: suppose you have an array with 50 numbers and want to multiply them with a constant. In a normal computer you will load the first array element, multiply it, store the result, load the second array element, ... A vector computer has specially designed registers where you can load all the 50 numbers at the same time, multiply simultaneously and then store the 50 results in one step. The term for this is "SIMD": Single Instruction Multiple Data. This (along with "MIMD", Multiple Instruction Multiple Data) was not only used by Cray but also by NEC (the SX-processor-series), Fujitsu and Hitachi. The methods developed back then (up to the mid-80ies) are used in common-purpose processors still.

The next development in Supercomputing was the massively parallel systems, which took the ant-metaphor to an extreme. It started with the Goodyear MPP but then came Inmos with its T400 and T800 transputer boards: a processor with serial links which could be used for inter-processor communication.

The didn't take off as deserved but many principles developed back then are used in modern processors. One of the machines inspired by these were the IBM SP/2. You might remember the first computer to beat a reigning world champion (Gary Kasparov) in chess - that was the SP/2. SP/2 were rack-mounted R/6000 workstations connected via a so-called "High Performance Switch" based on wormhole routing. Many companies misused the SP/2 as management platform but a few universities and other organisations (the "Deutscher Wetterdienst" (german weather authority) for instance) used the SP/2 as the high-performance platform it was intended to be.

At last clusters took over: most of the massively parallel machines were custom-built and one had to learn the system from the base to be productive with it. Clusters offered similar functionality but building on standard toolsets so that expertise was easier (and cheaper) to acquire. The difference to massively parallel computers is that the parallel processors are not directly built into one computer but components are run-of-the-mill computer (usually Intel-compatible PCs) and the coordination is done via standard network interfaces (ethernet).

It is a special form of "distributed computing" and one of the first projects that took off utilized a certain variant called "cycle scavenging" (using otherwise unused resources in a network): SETI at home. Since then many other projects (Human Genome Project, GIMPS, Bitcoin, ....) used similar techniques.

Since 2007 it is common to call this "cloud computing", but in fact this is just old wine in new bottles. All these techniques have developed over time and offer solutions for problems as old as the IT business. Namely, how to increase processing capabilities and overcome natural limitations.

I hope this helps.

bakunin

1 Like

First I would like to say sorry but my post got moved so I hadnt filled out the template and thank you for the very detailed reply. So after some research on some other forums I came up with Maas Ubuntu. Is this something I could use? Also when I said interesting I was looking for something along the lines of a graphical molecular simulation or something along those lines and do I need to find application built for clusters or can I write a simple script in C of java to find prime numbers to run on the cluster. Thank again Bakunin for all the time you put into that reply.

Do not post classroom or homework problems in the main forums. Homework and coursework questions can only be posted in this forum under special homework rules.

Please review the rules, which you agreed to when you registered, if you have not already done so.

More-than-likely, posting homework in the main forums has resulting in a forum infraction. If you did not post homework, please explain the company you work for and the nature of the problem you are working on.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

Thank You.

The UNIX and Linux Forums.

1 Like