GCC, MSVC, LLVM, and probably other toolchains have support for link-time (whole program) optimization to allow optimization of calls among compilation units.
Is there a reason not to enable this option when compiling production software?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…