Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
549 views
in Technique[技术] by (71.8m points)

concurrency - GLSL SpinLock only Mostly Works

I have implemented a depth peeling algorithm using a GLSL spinlock (inspired by this). In the following visualization, notice how overall the depth peeling algorithm functions correctly (first layer top left, second layer top right, third layer bottom left, fourth layer bottom right). The four depth layers are stored into a single RGBA texture.

Unfortunately, the spinlock sometimes fails to prevent errors--you can see little white speckles, particularly in the fourth layer. There's also one on the wing of the spaceship in the second layer. These speckles vary each frame.

enter image description here

In my GLSL spinlock, when a fragment is to be drawn, the fragment program reads and write a locking value into a separate locking texture atomically, waiting until a 0 shows up, indicating that the lock is open. In practice, I found that the program must be parallel, because if two threads are on the same pixel, the warp cannot continue (one must wait, while the other continues, and all threads in a GPU thread warp must execute simultaneously).

My fragment program looks like this (comments and spacing added):

#version 420 core

//locking texture
layout(r32ui) coherent uniform uimage2D img2D_0;
//data texture, also render target
layout(RGBA32F) coherent uniform image2D img2D_1;

//Inserts "new_data" into "data", a sorted list
vec4 insert(vec4 data, float new_data) {
    if      (new_data<data.x) return vec4(      new_data,data.xyz);
    else if (new_data<data.y) return vec4(data.x,new_data,data.yz);
    else if (new_data<data.z) return vec4(data.xy,new_data,data.z);
    else if (new_data<data.w) return vec4(data.xyz,new_data      );
    else                      return data;
}

void main() {
    ivec2 coord = ivec2(gl_FragCoord.xy);

    //The idea here is to keep looping over a pixel until a value is written.
    //By looping over the entire logic, threads in the same warp aren't stalled
    //by other waiting threads.  The first imageAtomicExchange call sets the
    //locking value to 1.  If the locking value was already 1, then someone
    //else has the lock, and can_write is false.   If the locking value was 0,
    //then the lock is free, and can_write is true.  The depth is then read,
    //the new value inserted, but only written if can_write is true (the
    //locking texture was free).  The second imageAtomicExchange call resets
    //the lock back to 0.

    bool have_written = false;
    while (!have_written) {
        bool can_write = (imageAtomicExchange(img2D_0,coord,1u) != 1u);

        memoryBarrier();

        vec4 depths = imageLoad(img2D_1,coord);
        depths = insert(depths,gl_FragCoord.z);

        if (can_write) {
            imageStore(img2D_1,coord,depths);
            have_written = true;
        }

        memoryBarrier();

        imageAtomicExchange(img2D_0,coord,0);

        memoryBarrier();
    }
    discard; //Already wrote to render target with imageStore
}

My question is why this speckling behavior occurs? I want the spinlock to work 100% of the time! Could it relate to my placement of memoryBarrier()?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

For reference, here is locking code that has been tested to work on Nvidia driver 314.22 & 320.18 on a GTX670. Note that existing compiler optimization bugs are triggered if the code is reordered or rewritten to logically equivalent code (see comments below.) Note in the below I use bindless image references.

// sem is initialized to zero
coherent uniform layout(size1x32) uimage2D sem;

void main(void)
{
    ivec2 coord = ivec2(gl_FragCoord.xy);

    bool done = false;
    uint locked = 0;
    while(!done)
    {
     // locked = imageAtomicCompSwap(sem, coord, 0u, 1u); will NOT work
        locked = imageAtomicExchange(sem, coord, 1u);
        if (locked == 0)
        {
            performYourCriticalSection();

            memoryBarrier();

            imageAtomicExchange(sem, coord, 0u);

            // replacing this with a break will NOT work
            done = true;
        }
    }

    discard;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...