According to the documentation, there is a fma()
function in math.h
. That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86
and x86_64
architectures.
Is there a floating-point (non-vector) instruction for FMA, perhaps as defined by IEEE-754 2008?
Is FMA3 or FMA4 instruction used?
Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…