The Intel Advanced Vector Extensions (AVX) offers no dot product in the 256-bit version (YMM register) for double precision floating point variables. The "Why?" question have been very briefly treated in another forum (here) and on Stack Overflow (here). But the question I am facing is how to replace this missing instruction with other AVX instructions in an efficient way?
The dot product in 256-bit version exists for single precision floating point variables (reference here):
__m256 _mm256_dp_ps(__m256 m1, __m256 m2, const int mask);
The idea is to find an efficient equivalent for this missing instruction:
__m256d _mm256_dp_pd(__m256d m1, __m256d m2, const int mask);
To be more specific, the code I would like to transform from __m128
(four floats) to __m256d
(4 doubles) use the following instructions:
__m128 val0 = ...; // Four float values
__m128 val1 = ...; //
__m128 val2 = ...; //
__m128 val3 = ...; //
__m128 val4 = ...; //
__m128 res = _mm_or_ps( _mm_dp_ps(val1, val0, 0xF1),
_mm_or_ps( _mm_dp_ps(val2, val0, 0xF2),
_mm_or_ps( _mm_dp_ps(val3, val0, 0xF4),
_mm_dp_ps(val4, val0, 0xF8) )));
The result of this code is a _m128
vector of four floats containing the results of the dot products between val1
and val0
, val2
and val0
, val3
and val0
, val4
and val0
.
Maybe this can give hints for the suggestions?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…