Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
613 views
in Technique[技术] by (71.8m points)

x86 - SSE intrinsics: Convert 32-bit floats to UNSIGNED 8-bit integers

Using SSE intrinsics, I've gotten a vector of four 32-bit floats clamped to the range 0-255 and rounded to nearest integer. I'd now like to write those four out as bytes.

There is an intrinsic _mm_cvtps_pi8 that will convert 32-bit to 8-bit signed int, but the problem there is that any value over 127 gets clamped to 127. I can't find any instructions that will clamp to unsigned 8-bit values.

I have an intuition that what I may want to do is some combination of _mm_cvtps_pi16 and _mm_shuffle_pi8 followed by move instruction to get the four bytes I care about into memory. Is that the best way to do it? I'm going to see if I can figure out how to encode the shuffle control mask.

UPDATE: The following appears to do exactly what I want. Is there a better way?

#include <tmmintrin.h>
#include <stdio.h>

unsigned char out[8];
unsigned char shuf[8] = { 0, 2, 4, 6, 128, 128, 128, 128 };
float ins[4] = {500, 0, 120, 240};

int main()
{
    __m128 x = _mm_load_ps(ins);    // Load the floats
    __m64 y = _mm_cvtps_pi16(x);    // Convert them to 16-bit ints
    __m64 sh = *(__m64*)shuf;       // Get the shuffle mask into a register
    y = _mm_shuffle_pi8(y, sh);     // Shuffle the lower byte of each into the first four bytes
    *(int*)out = _mm_cvtsi64_si32(y); // Store the lower 32 bits

    printf("%d
", out[0]);
    printf("%d
", out[1]);
    printf("%d
", out[2]);
    printf("%d
", out[3]);
    return 0;
}

UPDATE2: Here's an even better solution based on Harold's answer:

#include <smmintrin.h>
#include <stdio.h>

unsigned char out[8];
float ins[4] = {10.4, 10.6, 120, 100000};

int main()
{   
    __m128 x = _mm_load_ps(ins);       // Load the floats
    __m128i y = _mm_cvtps_epi32(x);    // Convert them to 32-bit ints
    y = _mm_packus_epi32(y, y);        // Pack down to 16 bits
    y = _mm_packus_epi16(y, y);        // Pack down to 8 bits
    *(int*)out = _mm_cvtsi128_si32(y); // Store the lower 32 bits

    printf("%d
", out[0]);
    printf("%d
", out[1]);
    printf("%d
", out[2]);
    printf("%d
", out[3]);
    return 0;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There is no direct conversion from float to byte, _mm_cvtps_pi8 is a composite. _mm_cvtps_pi16 is also a composite, and in this case it's just doing some pointless stuff that you undo with the shuffle. They also return annoying __m64's.

Anyway, we can convert to dwords (signed, but that doesn't matter), and then pack (unsigned) or shuffle them into bytes. _mm_shuffle_(e)pi8 generates a pshufb, Core2 45nm and AMD processors aren't too fond of it and you have to get a mask from somewhere.

Either way you don't have to round to the nearest integer first, the convert will do that. At least, if you haven't messed with the rounding mode.

Using packs 1: (not tested) -- probably not useful, packusdw already outputs unsigned words but then packuswb wants signed words again. Kept around because it is referred to elsewhere.

cvtps2dq xmm0, xmm0  
packusdw xmm0, xmm0     ; unsafe: saturates to a different range than packuswb accepts
packuswb xmm0, xmm0
movd somewhere, xmm0

Using different shuffles:

cvtps2dq xmm0, xmm0  
packssdw xmm0, xmm0     ; correct: signed saturation on first step to feed packuswb
packuswb xmm0, xmm0
movd somewhere, xmm0

Using shuffle: (not tested)

cvtps2dq xmm0, xmm0
pshufb xmm0, [shufmask]
movd somewhere, xmm0

shufmask: db 0, 4, 8, 12, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...