According to the documentation, thrust expects the type for summation to be reflected in the init
value:
sum1 = thrust::reduce(dev_ptr, dev_ptr+NZ*NX, 0, thrust::plus<real>());
^
The type of that constant you have is an integral type. If you change that to a double-precision constant:
sum1 = thrust::reduce(dev_ptr, dev_ptr+NZ*NX, 0.0, thrust::plus<real>());
you get matching results, between CPU and GPU, according to my testing. (You could alternatively cast your constant to real
type: (real)0
and use that, and there are other ways to address this as well, such as dropping the use of the init value and the binary op.)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…