In Nsight Visual Studio Edition 3.0 CUDA Profiler the Issue Efficiency displays a pie chart of the warp stall reasons. The stall reasons are Instruction Fetch, Execution Dependency, Data Requests, Texture, Synchronization, and Other.
For Compute Capability 3.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
- execution unit is busy (reduce use of low throughput integer operations)
- register bank conflicts (compiler issue that can sometimes be made worst by heavy use of vector data types)
- too few warps per scheduler
For Compute Capability 5.* and 6.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
- register bank conflicts (compiler issue that can sometimes be made worst by heavy use of vector data types)
- warps waiting to resolve branches
- warps that are lower priority and are not currently being considered for scheduling
For 5.* and 6.*, especially gp100, the last reason can be very high (~75%) if the kernel reaches 32 warps per warp scheduler.
These stalls reasons are grouped into the other category as it is hard to identify actions that a developer can taken to resolve these issues.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…