Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
860 views
in Technique[技术] by (71.8m points)

graphics - How vulkan pipeline barrier is implemented in terms of GPU or its driver?

I thought pipeline barriers is kind of reordering of commands in the kernel mode driver's but it does not seems to be true. Also i thought it can be something like hints for driver side GPU scheduler, but it still doesn't seems to be true. Is it just a hint for building KMD's main command buffer or a pipeline barrier represents some sort of instruction for GPU command processor?

Edit: How pipeline barriers can possibly be implemented?

question from:https://stackoverflow.com/questions/65944292/how-vulkan-pipeline-barrier-is-implemented-in-terms-of-gpu-or-its-driver

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As Nicol said, the details will be implementation and barrier-specific. But the general principles are similar.

GPUs are pipelined processors, which means that each command goes through several stages of execution. Stages run concurrenly, both within and between draw calls. For example, fragment processing for draw command N might be executing at the same time as vertex processing for draw command N+1. A barrier command prevents work for later commands from starting to executing any stages in a dstStageMask before previous commands have finished executing all stages in a srcStageMask. In addition to this execution barrier, pipeline barriers can also include a memory barrier, which ensures that the memory accesses done by the earlier commands are properly ordered with the memory accesses of the later commands.

So none of this has to do with reordering, it only has to do with the fact that new commands start before previous commands finish, and sometimes you need to prevent that from happening. The most obvious example is render-to-texture: you want all of writes to the texture to finish before any of the reads from the texture occur. Without a pipeline barrier, nothing prevents those from overlapping.

Generally GPUs will have an (internal) command that "signals" some marker after previous commands have all passed the point where the command executes, and another command that "waits" for some marker to become signaled before allowing any subsequent work to proceed. The marker might simply be a memory location (signal: write a specific value, wait: spin until the location contains a specific value). Or it might be a special-purpose on-chip resource of some kind. A barrier would then just be a signal command followed by a wait command. This creates a "bubble" in the pipeline where parts of the GPU are sitting idle. To minimize this, some GPUs will be able to signal and wait at multiple points in the pipeline. It's common to be able to signal after color writes have completed, and wait before rasterization, for example, allowing vertex work for draw N+1 to happen while fragment work for draw N is happening, but stall the N+1 fragment work. The srcStageMask and dstStageMask allow implementations to know precisely what the dependencies are, but most can't actually take advantage of that fine of granularity and will create a coarser conservative bubble.

Memory barriers correspond to cache clean (write dirty lines to memory) and cache invalidate (remove cached data) operations. "Visibility" means cleaning a local cache, writing data back to memory, so that any later read from memory will see the new values. "Availability" means invalidating a local cache, so that any new reads will miss in the cache (instead of seeing old data) and fetch from memory. GPUs typically have special commands for these specific operations targetting different local caches (e.g. texture cache, depth/stencil cache, and others).

So in the render to texture case, you'd insert a pipeline barrier between the texture-rendering commands and any commands that read from the texture. The driver would generate a GPU a command (or commands) that wait for previous commands to finish executing the blend stage, then clean the color-attachment cache, then signal some marker. Then it would generate GPU commands to wait for the marker to be signaled and invalidate the texture cache before allowing any future draw calls to start fragment shading.

As I said earlier, the actual commands the GPU provides might be much coarser than this, e.g. it might be "wait for all previous draw commands to entirely complete, then clean and invalidate all caches that don't have automatic coherency, and then allow subsequent draw commands to begin vertex fetch and vertex shading."


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...