c++ - avx2 register bits reverse

Question

Welcome To Ask or Share your Answers For Others

c++ - avx2 register bits reverse

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - avx2 register bits reverse

Is there a (fast) way to perform bits reverse of 32bit int values within avx2 register? E.g.

_mm256_set1_epi32(2732370386); 
<do something here>
//binary: 10100010110111001010100111010010 => 1001011100101010011101101000101
//register contains 1268071237 which is decimal representation of 1001011100101010011101101000101

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:21:42+0000

Since I can't find a suitable dupe, I'll just post it.

The main idea here is to make use of pshufb's dual use a parallel 16-entry table lookup to reverse the bits of each nibble. Reversing bytes is obvious. Reversing the order of the two nibble in every byte could be done by building it into the lookup tables (saves a shift) or by explicitly shifting the low part nibble up (saves a LUT).

Something like this in total, not tested:

__m256i rbit32(__m256i x) {
    __m256i shufbytes = _mm256_setr_epi8(3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12);
    __m256i luthigh = _mm256_setr_epi8(0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
    __m256i lutlow = _mm256_slli_epi16(luthigh, 4);
    __m256i lowmask = _mm256_set1_epi8(15);
    __m256i rbytes = _mm256_shuffle_epi8(x, shufbytes);
    __m256i high = _mm256_shuffle_epi8(lutlow, _mm256_and_si256(rbytes, lowmask));
    __m256i low = _mm256_shuffle_epi8(luthigh, _mm256_and_si256(_mm256_srli_epi16(rbytes, 4), lowmask));
    return _mm256_or_si256(low, high);
}

In a typical context in a loop, those loads should be lifted out.

Curiously Clang uses 4 shuffles, it's duplicating the first shuffle.

Categories

c++ - avx2 register bits reverse

c++ - avx2 register bits reverse

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags