Compiling with -msse2
on an Intel/AMD processor that supports it will get you almost there. Do not let any library put the FPU in FTZ/DNZ mode, and you will be mostly set (processor bugs notwithstanding).
For other architectures, the answer would be different. Those achitectures that do not offer any convenient way to get exact IEEE 754 semantics (for instance, pre-SSE2 IA32 CPUs) would require the use of a floating-point emulation library to get the result you want, at a very high performance penalty.
If your target architecture supports the fmadd
(multiplication and addition without intermediate rounding) instruction, make sure your compiler does not use it when you have explicit multiplications and additions in the source code. GCC is not supposed to do this unless you use the -ffast-math option.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…