Could you run this benchmark? (Especially if you have a V880 or V890)

Hi all

I am currently using a T5120 to write and run some simulation code I've been working on which heavily relies on large matrix multiplication procedures. I am posting this to call out for some of you run and share the results of a simple benchmark I've written to compare the matrix multiplication performance of the T5120 against other machines in Solaris (mostly SPARC). I am interested to see how the T2 fairs against other processors with these kind of operations.

Here is the code:

subroutine fillMat(Mat,n)
  implicit none
  integer :: i, j
  integer :: n
  real*8 :: Mat(n,n)
  do j=1,n,1
    do i=1,n,1
      call random_number(Mat(i,j))
    end do
  end do
end subroutine fillMat

program main
  implicit none
  integer :: n
  integer :: seed(30)
  integer :: c1, c2, count_rate, count_max
  real*8 :: alpha, beta
  real*8, allocatable :: A(:,:), B(:,:), C(:,:)
  seed=1
  call random_seed(put=seed)
  do n=100,4900,300
    allocate(A(n,n),B(n,n),C(n,n))
    call fillMat(A,n); call fillMat(B,n); call fillMat(C,n)
    call random_number(alpha); call random_number(beta)
    CALL SYSTEM_CLOCK(c1,count_rate,count_max)
    call dgemm('N','N',n,n,n,alpha,A,n,B,n,beta,C,n)
    CALL SYSTEM_CLOCK(c2,count_rate,count_max)
    write(*,*) n, dfloat(c2-c1)/dfloat(count_rate)
    deallocate(A,B,C)
  end do
end program main

I compiled this using:

sunf90 -fast -xvector -m64 -library=sunperf -fopenmp main.f90 -o main.run

And with:

export OMP_NUM_THREADS=64

I get the following results:

 100 0.025336
 400 0.016322
 700 0.076462
 1000 0.215403
 1300 0.474393
 1600 0.872482
 1900 1.465775
 2200 2.251058
 2500 3.353732
 2800 4.661241
 3100 6.33447
 3400 8.322112
 3700 10.744999
 4000 13.514226
 4300 16.897237
 4600 20.572952
 4900 24.986237

Where the first column is the size of matrices A,B,C for the DGEMM operation C=alpha*A*B+beta*C and A(n,n) , B(n,n) , C(n,n) and second column is the time in seconds.

Thank you all in advance.

I have neither a V440 nor a fortran compiler.
But you can get a rough figure by comparing the clock speeds. Note that the Niagara (e.g. the T5 series) uses half the clock speed on the CPU cores. For example a 3.0 GHz Niagara should be like a 1.5 Ghz V440 in computation.
However the Niagara could run more computations (e.g. benchmarks) in parallel with still high speed.