I'm writing a matrix multiplication subroutine in Fortran. I'm using the Intel Fortran compiler. I've written a simple static scheduled parallel do-loop. Unfortunately, it's running on only one thread. Here's the code:
SUBROUTINE MATMULT(A,B,C,L,M,N)
REAL*8 A,B,C
INTEGER NCORES, CHUNK, TID
DIMENSION A(L,N),B(L,M),C(M,N)
PARAMETER (NCORES=8)
CHUNK=(L/(NCORES+1))+1
TID=0
!$OMP PARALLELDO SHARED(A,B,C,L,M,N,CHUNK) PRIVATE(I,J,K,TID)
!$OMP+DEFAULT(NONE) SCHEDULE(STATIC,CHUNK)
DO I=1,L
TID = OMP_GET_THREAD_NUM()
PRINT *, "THREAD ", TID, " ON I=", I
DO K=1,N
DO J=1,M
A(I,K) = A(I,K) + B(I,J)*C(J,K)
END DO
END DO
END DO
!$OMP END PARALLELDO
RETURN
END
Note:
- There are no parallel directives in the main program that calls the routine
- The arrays A,B,C are initialized serially in the main program. A is initialized to zeros
- I am enforcing the Fortran fixed source form during compilation
I have confirmed the following:
- Another example program works fine with 8 threads (so no hardware issue)
- I have used the -openmp compiler argument
- OMP_GET_NUM_PROCS() and OMP_GET_MAX_THREADS() both return 0
- TID is 0 for every iteration over I (which shouldn't be the case)
I am unable to diagnose my mistake. I'd appreciate any inputs on this.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…