Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
350 views
in Technique[技术] by (71.8m points)

c++ - 使用AVX2指令向左移128位数字(left shift of 128 bit number using AVX2 instruction)

I am trying to do left rotation of a 128 bit number in AVX2.

(我正在尝试在AVX2中向左旋转128位数字。)

Since there is no direct method of doing this, I have tried using left shift and right shift to accomplish my task.

(由于没有直接的方法可以执行此操作,因此我尝试使用左移和右移来完成任务。)

Here is a snippet of my code to do the same.

(这是我的代码的一部分,以执行相同的操作。)

        l = 4;
        r = 4;
        targetrotate = _mm_set_epi64x (l, r);
        targetleftrotate = _mm_sllv_epi64 (target, targetrotate);

The above c ode snippet rotates target by 4 to the left.

(上面的代码片段将目标向左旋转4。)
When I tested the above code with a sample input, I could see the result is not rotated correctly.

(当我使用示例输入测试上述代码时,我可以看到结果未正确旋转。)

Here is the sample input and output

(这是示例输入和输出)

          input: 01 23 45 67 89 ab cd ef   fe dc ba 98 76 54 32 10
obtained output: 10 30 52 74 96 b8 da fc   e0 cf ad 8b 69 47 25 03

But, the output I expect is

(但是,我期望的输出是)

                 12 34 56 78 9a bc de f0   ed cb a9 87 65 43 21 00

I know that I am doing something wrong.

(我知道我做错了。)

I want to know whether my expected output is right and if so, I want to know what am I doing wrong here.

(我想知道我的预期输出是否正确,如果是,我想知道我在这里做错了什么。)

Any kind of help would be greatly appreciated and thanks in advance.

(任何帮助将不胜感激,并在此先感谢。)

  ask by krishnan translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think you have an endian issue with how you're printing your input and output.

(我认为您在打印输入和输出时遇到了一个字节序问题。)

The left-most bytes within each 64-bit half are the least-significant bytes in your actual output , so 0xfe << 4 becomes 0xe0 , with the f shifting into a higher byte.

(每个64位半部分中最左边的字节是实际输出中的最低有效字节 ,因此0xfe << 4变为0xe0 ,而f移至更高的字节。)

See Convention for displaying vector registers for more discussion of that.

(有关更多讨论,请参见显示向量寄存器的约定 。)

Your "expected" output matches what you'd get if you were printing values high element first (highest address when stored).

(您的“预期”输出与您首先打印高元素(存储时的最高地址)的值相符。)

But that's not what you're doing;

(但这不是你在做什么;)

you're printing each byte separately in ascending memory order.

(您将按升序分别打印每个字节。)

x86 is little-endian.

(x86是Little-endian。)

This conflicts with the numeral system we use in English, where we read Arabic numerals from left to right, highest place-value on the left, effectively human big-endian.

(这与我们在英语中使用的数字系统相冲突,在英语中,我们从左到右读取阿拉伯数字,在左侧是最高的位数值,实际上是人类的大端数字。)

Fun fact: The Arabic language reads from right to left so for them, written numbers are "human little-endian".

(有趣的事实:阿拉伯语从右到左阅读,因此对于他们来说,书面数字是“人类的小端”。)

(And across elements, higher elements are at higher addresses; printing high elements first makes whole-vector shifts like _mm_bslli_si128 aka pslldq make sense in the way it shifts bytes left between elements.)

((在元素之间,较高的元素位于较高的地址;首先打印较高的元素会使_mm_bslli_si128类的全矢量移位(也称为pslldq在将元素之间的字节左移的方式上很有意义。))

If you're using a debugger, you're probably printing within that.

(如果使用调试器,则可能在其中进行打印。)

If you're using debug-prints, see print a __m128i variable .

(如果您使用的是调试打印,请参阅打印__m128i变量 。)


BTW, you can use _mm_set1_epi64x(4) to put the same value in both elements of a vector, instead of using separate l and r variables with the same value.

(顺便说一句,您可以使用_mm_set1_epi64x(4)将相同的值放入向量的两个元素中,而不是使用具有相同值的单独的lr变量。)

In _mm_set intrinsics, the high elements come first , matching the diagrams in Intel's asm manuals, and matching the semantic meaning of "left" shift moving bits/bytes to the left.

(在_mm_set内部函数中,高位元素排在第一位 ,与Intel的asm手册中的图相匹配,并且与“左”的语义相匹配,即向左移动位/字节。)

(eg see Intel's diagrams an element-numbering for pshufd, _mm_shuffle_epi32 )

((例如,参见英特尔图_mm_shuffle_epi32的元素编号_mm_shuffle_epi32 ))


BTW, AVX512 has vprolvq rotates.

(顺便说一句,AVX512具有vprolvq旋转功能。)

But yes, to emulate rotates you want a SIMD version of (x << n) | x >> (64-n)

(但是,是的,要模拟旋转,您需要SIMD版本(x << n) | x >> (64-n))

(x << n) | x >> (64-n) .

((x << n) | x >> (64-n) 。)

Note that x86 SIMD shifts saturate the shift count, unlike scalar shifts which mask the count.

(请注意,x86 SIMD移位会使移位计数饱和 ,这与掩盖该计数的标量移位不同。)

So x >> 64 will shift out all the bits.

(因此, x >> 64将移出所有位。)

If you want to support rotate counts above 63, you probably need to mask.

(如果要支持大于63的循环计数,则可能需要屏蔽。)

( Best practices for circular shift (rotate) operations in C++ but you're using intrinsics so you don't have to worry about C shift-count UB, just the actual known hardware behaviour.)

(( C ++中循环移位(旋转)操作的最佳做??法,但您使用的是内部函数,因此您不必担心C移位计数UB,而不必担心实际的硬件行为。))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...