The Haswell architectures comes up with several new instructions. One of them is PEXT
(parallel bits extract) whose functionality is explained by this image (source here):
It takes a value r2
and a mask r3
and puts the extracted bits of r2
into r1
.
My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…