parallel processing - Could a CUDA kernel call a cublas function?

Question

Welcome To Ask or Share your Answers For Others

parallel processing - Could a CUDA kernel call a cublas function?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

parallel processing - Could a CUDA kernel call a cublas function?

I know it sound weird, but here is my scenario:

I need to do a matrix-matrix multiplication (A(n*k)*B(k*n)), but I only needs the diagonal elements to be evaluated for the output matrix. I searched cublas library and didn't find any level 2 or 3 functions that can do that. So, I decided to distribute each row of A and each column of B into CUDA threads. For each thread (idx), I need to calculate the dot product "A[idx,:]*B[:,idx]" and save it as the corresponding diagonal output. Now since this dot product also takes some time, and I wonder whether I could somehow call cublas function here (say cublasSdot) to achieve it.

If I missed some cublas function that can achieve my goal directly (only calculate the diagonal elements for a matrix-matrix multiplication), this question could be discarded.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:02:58+0000

Yes, it can (until (and excluding) version CUDA 10).

"The language interface and Device Runtime API available in CUDA C/C++ is a subset of the CUDA Runtime API available on the Host. The syntax and semantics of the CUDA Runtime API have been retained on the device in order to facilitate ease of code reuse for API routines that may run in either the host or device environments. A kernel can also call GPU libraries such as CUBLAS directly without needing to return to the CPU." Source

Here you can see and Matrix-Vector Multiplication using cuda and CUBLAS library function cublasSgemv.

Bear in mind, however that there is no longer a device CUBLAS capability in CUDA 10.. From Robert_Crovella one can cite:

The current recommendation would be to see if CUTLASS 2 will help (it is mostly focused on GEMM related activities). If not, write your own code to perform the function, or call cublas from host code.

Nonetheless, currently there are several implementation online of Matrix-Vector Multiplication, for instance 1, 2, among others.

Categories

parallel processing - Could a CUDA kernel call a cublas function?

parallel processing - Could a CUDA kernel call a cublas function?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags