Recent questions tagged cuda

0 votes

893 views

1 answer

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

I am trying to cast a uint8_t array to uint32_t array. However, when i try to do this, I cant seem to be able ... any way that I can do this? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.2k views

1 answer

cuda - nvcc.exe linking error Microsoft Visual Studio configuration file 'vcvars64.bat' could not found

I want to use nvcc -ptx from windows command line, but I always get this error message: nvcc : fatal error ... . What can be the solution? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

688 views

1 answer

cuda - CUDA_ERROR_INVALID_IMAGE during cuModuleLoad

I've created a very simple kernel (can be found here) which I successfully compile using "C:Program ... valid and compiles without issues. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

686 views

1 answer

cuda - How to start debug version of project in nsight with optirun command?

I'we been writing some simple cuda program (I'm student so I need to practice), and the thing is I can ... for helping in advance folks. :) See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

659 views

1 answer

cuda - Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1.?AT A GLANCE ... will stay the same? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

786 views

1 answer

cuda - How is the 2D thread blocks padded for warp scheduling?

I understand that for a 1D thread block with 31 threads, it will be padded to 32 threads for warp execution. What ... (31*31=961; 961%32=1)? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

744 views

1 answer

cuda - thrust::sequence - how to increase the step after each N elements

I am using thrust::sequence(myvector.begin(), myvector.end(), 0, 1) and achieve good ordered list like: 0, 1, ... or am I missing a simple way.. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

492 views

1 answer

cuda kernels not executing concurrently

I'm trying to explore the concurrent kernels execution property of my Nvidia Quadro 4000, which has 2.0 ... CHK_ERR(cudaDeviceReset()); } See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

675 views

1 answer

cuda - Should I check the number of threads in kernel code?

I am a beginner with CUDA, and my coworkers always design kernels with the following wrapping: __global__ ... specified block/grid dimensions? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

576 views

1 answer

cuda - JIT in JCuda, loading multiple ptx modules

I said in this question that I had some problem loading ptx modules in JCuda and after @talonmies's idea, I ... variable by reference in JCuda? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

658 views

1 answer

cuda - Caffe compilation fails due to unsupported gcc compiler version

I struggle with Caffe compilation. Unfortunately I failed to compile it. Steps I followed: git clone https://github.com/ ... .9 - what to do?. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

915 views

1 answer

cuda - CURAND Library - Compiling Error - Undefined reference to functions

I have the following code which I am trying to compile using nvcc. Code: #include <stdio.h> #include <stdlib.h ... to solve my problem. Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

755 views

1 answer

cuda - Performance of atomic operations on shared memory

How atomic operations perform when the address they are provided with resides in block shared memory? During ... atomic operation is done? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

906 views

1 answer

cuda - thrust reduction result on device memory

Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? In case it is ... I use a thrust::device_ptr? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

675 views

1 answer

cuda - What are "Other" Issue Stall Reasons displayed by the Nsight profiler?

I have a kernel that is performing poorly on CC 3.0 (Kepler) as opposed to CC 2.0 (Fermi). In the Nsight profiler, ... Nsight 3.0. RC / CC 3.0. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

569 views

1 answer

cuda - Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

I am little bit confused about the 'code=sm_X' option within the '-gencode' statement. An example: What does ... is conflicting in my opinion. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

684 views

1 answer

cuda - Amdahl's law and GPU

I have a couple of doubts regarding the application of Amdahl's law with respect to GPUs. For instance, I ... for the parallel code? Thanks See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

941 views

1 answer

cuda - How to use GPUDirect RDMA with Infiniband

I have two machines. There are multiple Tesla cards on each machine. There is also an InfiniBand card on each ... dealing with this in OpenMPI. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

433 views

1 answer

cuda - __activemask() vs __ballot_sync()

After read this post on CUDA Developer Blog I am struggling to understand when is safecorrect use __activemask ... the function interface. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

919 views

1 answer

cuda - How do we use cuPrintf()?

What do we have to do to use cuPrintf()? (device compute capability 1.2, Ubuntu 12) I couldn't find " ... "hello_kernel") is not allowed Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

974 views

1 answer

cuda - Set default host compiler for nvcc

I have just installed Debian Stretch (9) and Cuda 8 on a new GPU server. Stretch does not come with ... cuda config or an environment variable? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

889 views

1 answer

cuda - What is the difference between cudaMemcpy() and cudaMemcpyPeer() for P2P-copy?

I want to copy data from GPU0-DDR to GPU1-DDR directly without CPU-RAM. As said here on the page-15: http: ... any advantage, why it is needed? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

917 views

1 answer

cuda - Equivalent of cudaGetErrorString for cuBLAS?

CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a ... function like this? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.1k views

1 answer

cuda - Branch and predicated instructions

Section 5.4.2 of the CUDA C Programming Guide states that branch divergence is handled either by "branch ... set the predicate". Why? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

503 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

542 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

644 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

586 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

Categories

Just Browsing Browsing

Most popular tags