Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
235 views
in Technique[技术] by (71.8m points)

Which is the correct way of allocating shared memory statically in cuda and why?

Out of the following two ways of allocating shared memory statically, which method is correct and why? I get same results for both but I am trying to understand the behavior in a little more detail.

Kernel 1:

__shared__ int as[3][3],bs[3][3];
__global__ void Sharesum(int* a,int* b,int* c,int n)
{
    int s,k,i,sum=0;
    int tx,ty,bx,by;
    tx=threadIdx.x;
    ty=threadIdx.y;
    as[ty][tx]=a[tx+n*ty];
    bs[ty][tx]=b[tx+n*ty];
    sum += as[ty][tx]+bs[ty][tx];
    c[tx*n+ty]=sum;
}

kernel 2:

    __global__ void Sharesum(int* a,int* b,int* c,int n)
{
    __shared__ int as[3][3],bs[3][3];

    int s,k,i,sum=0;
    int tx,ty,bx,by;
    tx=threadIdx.x;
    ty=threadIdx.y;
    as[ty][tx]=a[tx+n*ty];
    bs[ty][tx]=b[tx+n*ty];
    sum += as[ty][tx]+bs[ty][tx];
    c[tx*n+ty]=sum;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There shouldn't be any difference between these two methods for what you have shown. I'm not sure there is an answer that suggests that one is "correct" and one isn't.

However, the first one, which we could call a "global scope" declaration, affects all kernels defined in the module. That means all kernels will reserve and have available shared allocations according to the global definition.

The second one only affects the kernel it is scoped to.

Either one or both could be correct, depending on your desired intent.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...