Recent questions tagged CUDA

0 votes

616 views

1 answer

cuda - When to call cudaDeviceSynchronize?

when is calling to the cudaDeviceSynchronize function really needed?. As far as I understand from the CUDA ... the program so much? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

611 views

1 answer

cuda - Calculating performance of CUFFT

I am running CUFFT on chunks (N*N/p) divided in multiple GPUs, and I have a question regarding calculating ... the performance of FFT? Thanks. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

438 views

1 answer

cuda - How CudaMalloc work?

I am trying to modify the imageDenosing class in CUDA SDK, I need to repeat the filter many time incase ... lead to many synchronise problem See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

697 views

1 answer

cuda - Texture memory-tex2D basics

While using texture memory I have come across the following code:- uint f = (blockIdx.x * blockDim.x) + ... f? This confuses me.. thankyou See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

528 views

1 answer

cuda - Is it possible to access hard disk directly from gpu?

Is it possible to access hard disk/ flash disk directly from GPU (CUDA/openCL) and load/store content ... any suggestions about the design. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

626 views

1 answer

cuda - Detecting ptx kernel of Thrust transform

I have following thrust::transform call. my_functor *f_1 = new my_functor(); thrust::transform(data.begin(), ... what are these other kernels? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

901 views

1 answer

cuda - About error code "invalid device function" by nvcc with compute_ and sm_ compile option

I hope you can help me to figure out the correct compiler option required for the below card: > ./ ... quantitatively this speed up? Thanks See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

656 views

1 answer

cuda - How to perform Hadamard product with CUBLAS on complex numbers?

I need the compute the element wise multiplication of two vectors (Hadamard product) of complex numbers with NVidia ... possible with CUBLAS). See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

606 views

1 answer

cuda thrust::remove_if throws "thrust::system::system_error" for device_vector?

I am currently using CUDA 7.5 under VS 2013. Today I needed to remove some of the elements from a ... , is there any thing special? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

518 views

1 answer

cuda - Retaining dot product on GPGPU using CUBLAS routine

I am writing a code to compute dot product of two vectors using CUBLAS routine of dot product but it returns the ... copy from CPU to GPGPU? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

534 views

1 answer

cuda - cuBLAS argmin -- segfault if outputing to device memory?

In cuBLAS, cublasIsamin() gives the argmin for a single-precision array. Here's the full function declaration: ... to device memory instead. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

870 views

1 answer

cuda - Howto pass flag to nvcc compiler in CMAKE

I have a C project in Cmake in which I have embedded cuda kernel module. I want to pass --ptxas-options=-v ... options=-v to my nvcc compiler ? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

806 views

1 answer

cuda - Allocate 2D array with cudaMallocPitch and copying with cudaMemcpy2D

I'm new in CUDA, I appreciate your help and hope you can help me. I need to store multiple elements of ... (dev_matrix); cudaFree(dev_vector); } See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

685 views

1 answer

cuda - conditional syncthreads & deadlock (or not)

A follow up Q to: EarlyExit and DroppedThreads According to the above links, the code below should dead-lock. Please ... ) result= add[0]; } See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

673 views

1 answer

cuda - cudaMemcpyToSymbol vs. cudaMemcpy why is it still around (cudaMemcpyToSymbol)

As stated by other questions and according to the link, you can ... .html#group__CUDART__MEMORY_1g9bcf02b53644eee2bef9983d807084c7 See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

463 views

1 answer

cuda - IEEE-754 standard on NVIDIA GPU (sm_13)

If I perform a float (single precision) operation on a Host and a Device (GPU arch sm_13) , then will the values be different ? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.2k views

1 answer

cuda - Transpose matrix multiplication in cuBLAS howto

The problem is simple: I have two matrices, A and B, that are M by N, where M >> N. I want to first take ... ? What about lda, ldb, ldc? Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

572 views

1 answer

cuda - CUDA5 Examples: Has anyone translated some cutil definitions to CUDA5?

Has anyone started to work with the CUDA5 SDK? I have an old project that uses some cutil functions, but ... sdkCreateTimer Just that simple... See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

573 views

1 answer

cuda - How to use cudaMalloc / cudaMemcpy for a pointer to a structure containing pointers?

I've looked all around this site and others, and nothing has worked. I'm resorting to posting a question for my ... other things ... } Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

661 views

1 answer

cuda - how to cast thrust::device_vector<int> to raw pointer

I have a thrust device_vector. I want to cast it to a raw pointer so that I can pass it to a kernel. How can I ... kernel<<<bl,tpb>>>(pass raw) See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.3k views

1 answer

cuda - Dynamic Parallelism - undefined reference to __cudaRegisterLinkedBinary linking error while compiling - separate compilation

I got a problem when I try to compile a simple code there are C++ and Cuda code compile in a separated way. ... Hat card: K20x Any idea? Thanks See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

479 views

1 answer

cuda block synchronization

I have b number of blocks and each block has t number of threads. I can use __syncthreads() to synchronize the ... blocks. How can I do this? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

758 views

1 answer

cuda - Default Pinned Memory Vs Zero-Copy Memory

In CUDA we can use pinned memory to more efficiently copy the data from Host to GPU than the default ... is a better programming practice. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

368 views

1 answer

cuda - Is starting 1 thread per element always optimal for data independent problems on the GPU?

I was writing a simple memcpy kernel to meassure the memory bandwith of my GTX 760M and to compare it to ... independent of all other elements? See Question&Answers more detail:os...

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

460 views

1 answer

cuda - cublas matrix inversion from device

I am trying to run a matrix inversion from the device. This logic works fine if called from the host. ... memory access inside the kernel. See Question&Answers more detail:os...

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

704 views

1 answer

cuda - Is there a way of setting default value for shared memory array?

Consider the following code: __global__ void kernel(int *something) { extern __shared__ int shared_array[]; // Some ... cell in some thread? See Question&Answers more detail:os...

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

754 views

1 answer

cuda - Branch predication on GPU

I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches. ... memory, and so on? Thanks See Question&Answers more detail:os...

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

624 views

1 answer

cuda - Copying data to "cufftComplex" data struct?

I have data stored as arrays of floats (single precision). I have one array for my real data, and one array ... it is stored in memory. Thanks! See Question&Answers more detail:os...

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Categories

Just Browsing Browsing

Most popular tags