Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
356 views
in Technique[技术] by (71.8m points)

vectorization - Is _mm_broadcast_ss faster than _mm_set1_ps?

Is this code

float a = ...;
__m256 b = _mm_broadcast_ss(&a)

always faster than this code

float a = ...;
_mm_set1_ps(a)

?

What if a defined as static const float a = ... rather than float a = ...?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

_mm_broadcast_ss has weaknesses imposed by the architecture which are largely hidden by the mm SSE API. The most important difference is as follows:

  • _mm_broadcast_ss is limited to loading values from memory only.

What this means is if you use _mm_broadcast_ss explicitly in a situation where the source is not in memory then the result will likely be less efficient than that of using _mm_set1_ps. This sort of situation typically happens when loading immediate values (constants), or when using the result of a recent calculation. In those situations the result will be mapped to a register by the compiler. To use the value for broadcast, the compiler must dump the value back to memory. Alternatively, a pshufd could be used to splat directly from register instead.

_mm_set1_ps is implementation-defined rather than being mapped to a specific underlying cpu operation (instruction). That means it might use one of several SSE instructions to perform the splat. A smart compiler with AVX support enabled should definitely use vbroadcastss internally when appropriate, but it depends on the AVX implementation state of the compilers optimizer.

If you're very confident you're loading from memory -- such as iterating over an array of data -- then direct use of broadcast is fine. But if there's any doubt at all, I would recommend stick with _mm_set1_ps.

And in the specific case of a static const float, you absolutely want to avoid using _mm_broadcast_ss().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...