Appendix D of the 3.2 version of the CUDA documentation refers to C++ support in CUDA device code.
It is clearly mentioned that CUDA supports "Classes for devices of compute capability 2.x". However, I'm working with devices of compute capability 1.1 and 1.3 and I can use this feature!
For instance, this code works:
// class definition voluntary simplified
class Foo {
private:
int x_;
public:
__device__ Foo() { x_ = 42; }
__device__ void bar() { return x_; }
};
//kernel using the previous class
__global__ void testKernel(uint32_t* ddata) {
Foo f;
ddata[threadIdx.x] = f.bar();
}
I'm also able to use widespread libraries such as Thrust::random random generation classes.
My only guess is that I'm able to do so thanks to the automatic inlining of __device__
marked function, but this does not explain the handling of member variables withal.
Have you ever used such features in the same conditions, or can you explain to me why my CUDA code behaves this way? Is there something wrong in the reference guide?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…