In the 4D case, you are simply using the API wrongly. OpenCL does not support an infinite number of global / local dimensions. Just up to 3.
In the 2D case, your indexing seems wrong. Assuming row-major arrays. It should be i + j * width
not i + j * height
.
In the 3D case, the indexing inside the kernel seems OK, assuming row-major memory layout and that dim1 equals cols (width) and dim2 equals rows (height). But anyway, your question lacks context:
- Input buffers allocation and initialization.
- Kernel invocation code (parameters, work group and global size).
- Result collection. synchronization.
- You could be accessing beyond the buffer allocated size. It should be checked.
Doing these steps incorrectly can easily lead to unexpected results. Even if your kernel code is OK.
If you wish to debug indexing issues, the easiest thing to do is to write a simple kernel that output the calculated index.
__kernel void test1(__global int* c, const int dim1, const int dim2) {
int i = get_global_id(0);
int j = get_global_id(1);
int k = get_global_id(2);
int idx = i + dim1 * j + dim1 * dim2 * k;
c[idx] = idx;
}
You should then expect a result with linearly increasing values. I would start with a single workgroup and then move on to using multiple workgroups.
Also, If you perform a simple element-wise operation between arrays, then it is much simpler to use 1D indexing. You could simply use a 1D workgroup and global size that equals the number of elements (rounded up to to fit workgroup dim):
__kernel void test1(__global int* a, __global int* b, __global int* c, const int total) {
// no need for complex indexing for elementwise operations
int idx = get_global_id(0);
if (idx < total)
{
c[idx] = a[idx] + b[idx];
}
}
You would probably set local_work_size
to the max size the hardware allows (for instance 512 for Nvidia, 256 for AMD) and global_work_size
to the total of elements rounded up to multiples of local_work_size
. See clEnqueueNDRangeKernel.
2D & 3D dims are usually used for operations that access adjacent elements in 2D / 3D space. Such as image convolutions.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…