timer - CUDA: cudaEvent_t and cudaThreadSynchronize usage

Question

Welcome To Ask or Share your Answers For Others

timer - CUDA: cudaEvent_t and cudaThreadSynchronize usage

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

timer - CUDA: cudaEvent_t and cudaThreadSynchronize usage

I am a bit confused about the usage of cudaEvent_t. Currently, I am using the clock() call like this to find the duration of a kernel call:

cudaThreadSynchronize();
clock_t begin = clock();

fooKernel<<< x, y >>>( z, w );

cudaThreadSynchronize();
clock_t end = clock();

// Print time difference: ( end - begin )

Looking for a timer of higher-resolution I am considering using cudaEvent_t. Do I need to call cudaThreadSynchronize() before I note down the time using cudaEventRecord() or is it redundant?

The reason I am asking is because there is another call cudaEventSynchronize(), which seems to wait until the event is recorded. If the recording is delayed, won't the time difference that is calculated show some extra time after the kernel has finished execution?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:14:49+0000

Actually there are even more synchronization functions (cudaStreamSynchronize). The programming guide has a detailed description what every one of those does. Using events as timers basically comes down to this:

//create events
cudaEvent_t event1, event2;
cudaEventCreate(&event1);
cudaEventCreate(&event2);

//record events around kernel launch
cudaEventRecord(event1, 0); //where 0 is the default stream
kernel<<<grid,block>>>(...); //also using the default stream
cudaEventRecord(event2, 0);

//synchronize
cudaEventSynchronize(event1); //optional
cudaEventSynchronize(event2); //wait for the event to be executed!

//calculate time
float dt_ms;
cudaEventElapsedTime(&dt_ms, event1, event2);

It's important to synchronize on event2 because you want to make sure everything got executed before calculating the time. As both events and the kernel are on the same stream (order is preserved) event1 and kernel got executed too.

You could call cudaStreamSynchronize or even cudaThreadSynchronize instead but both are overkill in this case.

Categories

timer - CUDA: cudaEvent_t and cudaThreadSynchronize usage

timer - CUDA: cudaEvent_t and cudaThreadSynchronize usage

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags