Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
848 views
in Technique[技术] by (71.8m points)

x86 - SSE multiplication of 2 64-bit integers

How to multiply two 64-bit integers by another 2 64-bit integers? I didn't find any instruction which can do it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Late answer, but this is a better version of what Barabas posted.

If you have ever used GCC or Clang's vector extensions, this is the routine they use.

This uses the same method that long multiplication and grid multiplication use.

    65
  * 73
  ----
    15 //   (5 * 3)
   180 //   (6 * 3) * 10
   350 //   (5 * 7) * 10
+ 4200 // + (6 * 7) * 100
------
  4745

However, instead of doing each unit of 10, it uses each unit of 32 bits, and it leaves out the last multiply because it will always be shifted past the 64th bit, just like how you wouldn't multiply 6*7 if you were truncating values greater than 99.

#include <emmintrin.h>

/*
 * Grid/long multiply two 64-bit SSE lanes.
 * Works for both signed and unsigned.
 *   ----------------.--------------.----------------.
 *  |                |   b >> 32    | a & 0xFFFFFFFF |
 *  |----------------|--------------|----------------|  
 *  | d >> 32        |   b*d << 64  |    a*d << 32   |
 *  |----------------|--------------|----------------|
 *  | c & 0xFFFFFFFF |   b*c << 32  |       a*c      |
 *  '----------------'--------------'----------------'
 *  Add all of them together to get the product.
 *
 *  Because we truncate the value to 64 bits, b*d << 64 will be zero,
 *  so we can leave it out.
 *
 *  We also can add a*d and b*c first and then shift because of the
 *  distributive property: (a << 32) + (b << 32) == (a + b) << 32.
 */

__m128i Multiply64Bit(__m128i ab, __m128i cd)
{
    /* ac = (ab & 0xFFFFFFFF) * (cd & 0xFFFFFFFF); */
    __m128i ac = _mm_mul_epu32(ab, cd);

    /* b = ab >> 32; */
    __m128i b = _mm_srli_epi64(ab, 32);

    /* bc = b * (cd & 0xFFFFFFFF); */
    __m128i bc = _mm_mul_epu32(b, cd);

    /* d = cd >> 32; */
    __m128i d = _mm_srli_epi64(cd, 32);

    /* ad = (ab & 0xFFFFFFFF) * d; */
    __m128i ad = _mm_mul_epu32(ab, d);

    /* high = bc + ad; */
    __m128i high = _mm_add_epi64(bc, ad);

    /* high <<= 32; */
    high = _mm_slli_epi64(high, 32);

    /* return ac + high; */
    return _mm_add_epi64(high, ac);
}

Compiler Explorer Note: The GCC vector extension version was also included below for comparison.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...