My code is giving an error message and I am trying to track down the cause of it. To make it easier to find the problem, I have stripped away code that apparently is not relevant to causing the error message. If you can tell me why the following simple code produces an error message, then I think I should be able to fix my original code:
#include "cuComplex.h"
#include <cutil.h>
__device__ void compute_energy(void *data, int isample, int nsamples) {
cuDoubleComplex * const nminusarray = (cuDoubleComplex*)data;
cuDoubleComplex * const f = (cuDoubleComplex*)(nminusarray+101);
double * const abs_est_errorrow_all = (double*)(f+3);
double * const rel_est_errorrow_all = (double*)(abs_est_errorrow_all+nsamples*51);
int * const iid_all = (int*)(rel_est_errorrow_all+nsamples*51);
int * const iiu_all = (int*)(iid_all+nsamples*21);
int * const piv_all = (int*)(iiu_all+nsamples*21);
cuDoubleComplex * const energyrow_all = (cuDoubleComplex*)(piv_all+nsamples*12);
cuDoubleComplex * const refinedenergyrow_all = (cuDoubleComplex*)(energyrow_all+nsamples*51);
cuDoubleComplex * const btplus_all = (cuDoubleComplex*)(refinedenergyrow_all+nsamples*51);
cuDoubleComplex * const btplus = btplus_all+isample*21021;
btplus[0] = make_cuDoubleComplex(0.0, 0.0);
}
__global__ void computeLamHeight(void *data, int nlambda) {
compute_energy(data, blockIdx.x, nlambda);
}
int main(int argc, char *argv[]) {
void *device_data;
CUT_DEVICE_INIT(argc, argv);
CUDA_SAFE_CALL(cudaMalloc(&device_data, 184465640));
computeLamHeight<<<dim3(101, 1, 1), dim3(512, 1, 1), 45000>>>(device_data, 101);
CUDA_SAFE_CALL(cudaThreadSynchronize());
}
I am using a GeForce GTX 480 and I am compiling the code like so:
nvcc -L /soft/cuda-sdk/4.0.17/C/lib -I /soft/cuda-sdk/4.0.17/C/common/inc -lcutil_x86_64 -arch sm_13 -O3 -Xopencc "-Wall" Main.cu
The output is:
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 31 : unspecified launch failure.
EDIT: I have now further simplified the code. The following simpler code still produces the error message:
#include <cutil.h>
__global__ void compute_energy(void *data) {
*(double*)((int*)data+101) = 0.0;
}
int main(int argc, char *argv[]) {
void *device_data;
CUT_DEVICE_INIT(argc, argv);
CUDA_SAFE_CALL(cudaMalloc(&device_data, 101*sizeof(int)+sizeof(double)));
compute_energy<<<dim3(1, 1, 1), dim3(1, 1, 1)>>>(device_data);
CUDA_SAFE_CALL(cudaThreadSynchronize());
}
Now it is easy to see that the offset should be valid. I tried running cuda-memcheck and it says the following:
========= CUDA-MEMCHECK
Using device 0: GeForce GTX 480
Cuda error in file 'Main.cu' in line 13 : unspecified launch failure.
========= Invalid __global__ write of size 8
========= at 0x00000020 in compute_energy
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x200200194 is misaligned
=========
========= ERROR SUMMARY: 1 error
I tried searching the internet to find what is meant by the address being misaligned, but I failed to find an explanation. What is the deal?
See Question&Answers more detail:
os