Hi all
I am currently using a T5120 to write and run some simulation code I've been working on which heavily relies on large matrix multiplication procedures. I am posting this to call out for some of you run and share the results of a simple benchmark I've written to compare the matrix multiplication performance of the T5120 against other machines in Solaris (mostly SPARC). I am interested to see how the T2 fairs against other processors with these kind of operations.
Here is the code:
subroutine fillMat(Mat,n)
implicit none
integer :: i, j
integer :: n
real*8 :: Mat(n,n)
do j=1,n,1
do i=1,n,1
call random_number(Mat(i,j))
end do
end do
end subroutine fillMat
program main
implicit none
integer :: n
integer :: seed(30)
integer :: c1, c2, count_rate, count_max
real*8 :: alpha, beta
real*8, allocatable :: A(:,:), B(:,:), C(:,:)
seed=1
call random_seed(put=seed)
do n=100,4900,300
allocate(A(n,n),B(n,n),C(n,n))
call fillMat(A,n); call fillMat(B,n); call fillMat(C,n)
call random_number(alpha); call random_number(beta)
CALL SYSTEM_CLOCK(c1,count_rate,count_max)
call dgemm('N','N',n,n,n,alpha,A,n,B,n,beta,C,n)
CALL SYSTEM_CLOCK(c2,count_rate,count_max)
write(*,*) n, dfloat(c2-c1)/dfloat(count_rate)
deallocate(A,B,C)
end do
end program main
I compiled this using:
sunf90 -fast -xvector -m64 -library=sunperf -fopenmp main.f90 -o main.run
And with:
export OMP_NUM_THREADS=64
I get the following results:
100 0.025336
400 0.016322
700 0.076462
1000 0.215403
1300 0.474393
1600 0.872482
1900 1.465775
2200 2.251058
2500 3.353732
2800 4.661241
3100 6.33447
3400 8.322112
3700 10.744999
4000 13.514226
4300 16.897237
4600 20.572952
4900 24.986237
Where the first column is the size of matrices A,B,C for the DGEMM operation C=alpha*A*B+beta*C
and A(n,n)
, B(n,n)
, C(n,n)
and second column is the time in seconds.
Thank you all in advance.