gpu - Is the warmup code necessary when measuring CUDA kernel running time?

Question

Welcome To Ask or Share your Answers For Others

gpu - Is the warmup code necessary when measuring CUDA kernel running time?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

gpu - Is the warmup code necessary when measuring CUDA kernel running time?

In page 85, professional CUDA C programming:

int main()
{
    ......
    // run a warmup kernel to remove overhead
    size_t iStart,iElaps;
    cudaDeviceSynchronize();
    iStart = seconds();
    warmingup<<<grid, block>>> (d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("warmup <<< %4d %4d >>> elapsed %d sec 
",grid.x,block.x, iElaps );

    // run kernel 1
    iStart = seconds();
    mathKernel1<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds() - iStart;
    printf("mathKernel1 <<< %4d %4d >>> elapsed %d sec 
",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds();
    mathKernel2<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel2 <<< %4d %4d >>> elapsed %d sec 
",grid.x,block.x,iElaps );

    // run kernel 3
    iStart = seconds ();
    mathKernel3<<<grid, block>>>(d_C);
    cudaDeviceSynchronize();
    iElaps = seconds () - iStart;
    printf("mathKernel3 <<< %4d %4d >>> elapsed %d sec 
",grid.x,block.x,iElaps);
    ......
}

We can see there is a warmup before measuring the running time of different kernels.

From GPU cards warming up?, I know the reason is:

If they are non-display cards, it might well be the driver shutting itself down after a period of inactivity. So what you are seeing on the first run might well be initialization overhead that only happens once.

So if my GPU card isn't inactive for a long time, e.g, I just use it to run some programs, it should not need to run any warmup code. Is my understanding right?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:42:57+0000

Besides the GPU being in a power saving state there can be a number of other reasons why the first launch of a kernel could be slower than further runs:

just-in-time compilation
transfer of kernel to GPU memory
cache content
...

For these reasons it is always good practice to perform at least one "warmup run" before the timed kernel run, if you are interested in the sustained speed that consecutive kernel launches achieve.

If however you have a specific application and use case in mind, it always makes sense to benchmark that application under the relevant circumstances. Be prepared though for much larger variations in runtime in that less controlled measurement.

Categories

gpu - Is the warmup code necessary when measuring CUDA kernel running time?

gpu - Is the warmup code necessary when measuring CUDA kernel running time?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags