Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
463 views
in Technique[技术] by (71.8m points)

c - How to perform uint32/float conversion with SSE?

In SSE there is a function _mm_cvtepi32_ps(__m128i input) which takes input vector of 32 bits wide signed integers (int32_t) and converts them into floats.

Now, I want to interpret input integers as not signed. But there is no function _mm_cvtepu32_ps and I could not find an implementation of one. Do you know where I can find such a function or at least give a hint on the implementation? To illustrate the the difference in results:

unsigned int a = 2480160505; // 10010011 11010100 00111110 11111001   
float a1 = a; // 01001111 00010011 11010100 00111111;  
float a2 = (signed int)a; // 11001110 11011000 01010111 10000010
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

With Paul R's solution and with my previous solution the difference between the rounded floating point and the original integer is less than or equal to 0.75 ULP (Unit in the Last Place). In these methods at two places rounding may occur: in _mm_cvtepi32_ps and in _mm_add_ps. This leads to results that are not as accurate as possible for some inputs.

For example, with Paul R's method 0x2000003=33554435 is converted to 33554432.0, but 33554436.0 also exists as a float, which would have been better here. My previous solution suffers from similar inaccuracies. Such inaccurate results may also occur with compiler generated code, see here.

Following the approach of gcc (see Peter Cordes' answer to that other SO question), an accurate conversion within 0.5 ULP is obtained:

inline __m128 _mm_cvtepu32_ps(const __m128i v)
{
    __m128i msk_lo    = _mm_set1_epi32(0xFFFF);
    __m128  cnst65536f= _mm_set1_ps(65536.0f);

    __m128i v_lo      = _mm_and_si128(v,msk_lo);          /* extract the 16 lowest significant bits of v                                   */
    __m128i v_hi      = _mm_srli_epi32(v,16);             /* 16 most significant bits of v                                                 */
    __m128  v_lo_flt  = _mm_cvtepi32_ps(v_lo);            /* No rounding                                                                   */
    __m128  v_hi_flt  = _mm_cvtepi32_ps(v_hi);            /* No rounding                                                                   */
            v_hi_flt  = _mm_mul_ps(cnst65536f,v_hi_flt);  /* No rounding                                                                   */
    return              _mm_add_ps(v_hi_flt,v_lo_flt);    /* Rounding may occur here, mul and add may fuse to fma for haswell and newer    */
}                                                         /* _mm_add_ps is guaranteed to give results with an error of at most 0.5 ULP     */

Note that other high bits/low bits partitions are possible as long as _mm_cvt_ps can convert both pieces to floats without rounding. For example, a partition with 20 high bits and 12 low bits will work equally well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...