Newbie question

Dear all,

I have a question related to parallel programing and if you can give me some hints on how to deal with it, it would be really great.

I would like to run a small application on a supercompter of 128 CPUs. Unfortunately, on this machine only jobs which require 32 CPUs are allowed to run. My applications are, however, scalling only up to 16 CPUs. If I require 32 CPUs, the job will not run as I do not have enough data to fill the CPUs.

I think that maybe it is possible to submit, instead of a job on 32 CPUs, 2 jobs - each on 16 CPUs. This would mean that a master script (in MPI, probably) should ask at the begining for 32 CPUs, and then divide this partition into two smaller ones (of 16 CPUs), on each of them starting a different application (from a different directory).

As I have no experience with parallel programing I do not know to do this. Is it possible to give some hints on how to do it. (I do not expect someone to do the work in my place, but I need some hints to get start with, as possible keyword to use, etc).

Many thanks for any answer.

With all the best wishes,
Eduard

Disclaimer: It's been a while since I have used parallel machines so I cannot remember details, but since nobody else has answered yet I'll have a go.

Now, your supercomputer might be set up differently, but every one I have ever used
uses a job submission system like PBS or NQS. You write a script with commands for the particular job submission system you are using (N.B. MPI is the parallel communications library, not a scripting language). You need to check whether this system allows you to subdivide a 32-node partition and run on each subdvision.

If this is impossible, and you cannot convince the admin to allow 16-node partitions, then you probably will have to change your program logic to run on 32-nodes, which means learning a little bit about parallel programming. Start by finding out where the magic number 16 comes from in your program, how the data is decomposed over the nodes, and then try to figure out how to generalise this for 32 (or any number) nodes.