I made my first approach with vectorization intrinsics with SSE, where there is basically only one data type __m128i
. Switching to Neon I found the data types and function prototypes to be much more specific, e.g. uint8x16_t
(a vector of 16 unsigned char
), uint8x8x2_t
(2 vectors with 8 unsigned char
each), uint32x4_t
(a vector with 4 uint32_t
) etc.
First I was enthusiastic (much easier to find the exact function operating on the desired data type), then I saw what a mess it was when wanting to treat the data in different ways. Using specific casting operators would take me forever. The problem is also addressed here. I then came up with the idea of an union encapsulated into a struct, and some casting and assignment operators.
struct uint_128bit_t { union {
uint8x16_t uint8x16;
uint16x8_t uint16x8;
uint32x4_t uint32x4;
uint8x8x2_t uint8x8x2;
uint8_t uint8_array[16] __attribute__ ((aligned (16) ));
uint16_t uint16_array[8] __attribute__ ((aligned (16) ));
uint32_t uint32_array[4] __attribute__ ((aligned (16) ));
};
operator uint8x16_t& () {return uint8x16;}
operator uint16x8_t& () {return uint16x8;}
operator uint32x4_t& () {return uint32x4;}
operator uint8x8x2_t& () {return uint8x8x2;}
uint8x16_t& operator =(const uint8x16_t& in) {uint8x16 = in; return uint8x16;}
uint8x8x2_t& operator =(const uint8x8x2_t& in) {uint8x8x2 = in; return uint8x8x2;}
};
This approach works for me: I can use a variable of type uint_128bit_t
as an argument and output with different Neon intrinsics, e.g. vshlq_n_u32
, vuzp_u8
, vget_low_u8
(in this case just as input). And I can extend it with more data types if I need.
Note: The arrays are to easily print the content of a variable.
Is this a correct way of proceeding?
Is there any hidden flaw?
Have I reinvented the wheel?
(Is the aligned attribute necessary?)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…