Help with make this Fortran code more efficient (in HPC manner)

Hi there,

I had run into some fortran code to modify. Obviously, it was written without thinking of high performance computing and not parallelized... Now I would like to make the code "on track" and parallel. After a whole afternoon thinking, I still cannot find where to start. Can any one help me on how to HPC and parallel the code? Thank you very much.

Sharp

DO I=1,N
        DO J=1,N
         XXX= 0.0D+00
               DO K=1,N
                     DO L=1,N
                            XXX = XXX + C(K,I)*CABM(K,L)*C(L,J)
                     ENDDO
               ENDDO
               IF(I.eq.J) XXX=XXX-1.0d0
        ENDDO
 ENDDO

This one's not too hard. You're summing up all the subexpressions of I,k,j,l. Each iteration can be dine on a separate node and "reduced" to a single sum.

Perhaps the easiest way would be to parallelize the outermost loop, splitting the task among N processors and summing each result.

Do you have an MPI environment?

Hi, yes, I do have a MPI environment. However, I am not too familiar with MPI but OpenMP. And after a few days of thinking, I think I can parallelize the code successfully using OpenMP. And you were right, the best performance I got is to parallel the outer most loop. Thank you for your help.

As I understand it, openmp is limited to parallelization on a single shared-memory node. But I may be several years out of date there.