TL:DR: current compilers still have bool
missed-optimizations when doing stuff like
(a&&b) ? x : y
. But the reason why is not that they don't assume 0/1, they just suck at this.
Many uses of bool
are for locals, or inline functions, so booleanizing to a 0
/ 1
can optimize away and branch (or cmov or whatever) on the original condition. Only worry about optimizing bool
inputs / outputs when it does have to get passed/returned across something that doesn't inline, or really stored in memory.
Possible optimization guideline: combine bool
s from external sources (function args / memory) with bitwise operators, like a&b
. MSVC and ICC do better with this. IDK if it's ever worse for local bool
s. Beware that a&b
is only equivalent to a&&b
for bool
, not integer types. 2 && 1
is true, but 2 & 1
is 0 which is false. Bitwise OR doesn't have this problem.
IDK if this guideline will ever hurt for locals that were set from a comparison within the function (or in something that inlined). E.g. it might lead the compiler to actually make integer booleans instead of just using comparison results directly when possible. Also note that it doesn't seem to help with current gcc and clang.
Yes, C++ implementations on x86 store bool
in a byte that's always 0 or 1 (at least across function-call boundaries where the compiler has to respect the ABI / calling convention which requires this.)
Compilers do sometimes take advantage of this, e.g. for bool
->int
conversion even gcc 4.4 simply zero-extends to 32-bit (movzx eax, dil
). Clang and MSVC do this, too. C and C++ rules require this conversion to produce 0 or 1, so this behaviour is only safe if it's always safe to assume that a bool
function arg or global variable has a 0 or 1 value.
Even old compilers typically did take advantage of it for bool
->int
, but not in other cases. Thus, Agner is wrong about the reason when he says:
The reason why the compiler doesn't make such an assumption is that the variables might have other values if they are uninitialized or come from unknown sources.
MSVC CL19 does make code that assumes bool
function args are 0 or 1, so the Windows x86-64 ABI must guarantee this.
In the x86-64 System V ABI (used by everything other than Windows), the changelog for revision 0.98 says "Specify that _Bool
(aka bool
) is booleanized at the caller." I think even before that change, compilers were assuming it, but this just documents what compilers were already relying on. The current language in the x86-64 SysV ABI is:
3.1.2 Data Representation
Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true). When stored in integer registers (except for passing as arguments), all 8 bytes of the register are significant; any nonzero value is considered true.
The second sentence is nonsense: the ABI has no business telling compilers how to store things in registers inside a function, only at boundaries between different compilation units (memory / function args and return values). I reported this ABI defect a while ago on the github page where it's maintained.
3.2.3 Parameter passing:
When a value of type _Bool
is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero16.
(footnote 16): Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit.
The language in the i386 System V ABI is the same, IIRC.
Any compiler that assumes 0/1 for one thing (e.g. conversion to int
) but fails to take advantage of it in other cases has a missed optimization. Unfortunately such missed-optimizations still exist, although they are rarer than when Agner wrote that paragraph about compilers always re-booleanizing.
(Source + asm on the Godbolt compiler explorer for gcc4.6 / 4.7, and clang/MSVC. See also Matt Godbolt's CppCon2017 talk What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid)
bool logical_or(bool a, bool b) { return a||b; }
# gcc4.6.4 -O3 for the x86-64 System V ABI
test dil, dil # test a against itself (for non-zero)
mov eax, 1
cmove eax, esi # return a ? 1 : b;
ret
So even gcc4.6 didn't re-booleanize b
, but it did miss the optimization that gcc4.7 makes: (and clang and later compilers as shown in other answers):
# gcc4.7 -O3 to present: looks ideal to me.
mov eax, esi
or eax, edi
ret
(Clang's or dil, sil
/ mov eax, edi
is silly: it's guaranteed to cause a partial-register stall on Nehalem or earlier Intel when reading edi
after writing dil
, and it has worse code size from needing a REX prefix to use the low-8 part of edi. A better choice might be or dil,sil
/ movzx eax, dil
if you want to avoid reading any 32-bit registers in case your caller left some arg-passing registers with "dirty" partial registers.)
MSVC emits this code that checks a
then b
separately, completely failing to take advantage of anything, and even using xor al,al
instead of xor eax,eax
. So it has a false dependency on the old value of eax
on most CPUs (including Haswell/Skylake, which don't rename low-8 partial regs separately from the whole register, only AH/BH/...). This is just dumb. The only reason to ever use xor al,al
is when you explicitly want to preserve the upper bytes.
logical_or PROC ; x86-64 MSVC CL19
test cl, cl ; Windows ABI passes args in ecx, edx
jne SHORT $LN3@logical_or
test dl, dl
jne SHORT $LN3@logical_or
xor al, al ; missed peephole: xor eax,eax is strictly better
ret 0
$LN3@logical_or:
mov al, 1
ret 0
logical_or ENDP
ICC18 also doesn't take advantage of the known 0/1 nature of the inputs, it just uses an or
instruction to set flags according to the bitwise OR of the two inputs, and setcc
to produce a 0/1.
logical_or(bool, bool): # ICC18
xor eax, eax #4.42
movzx edi, dil #4.33
movzx esi, sil #4.33
or edi, esi #4.42
setne al #4.42
ret #4.42
ICC emits the same code even for bool bitwise_or(bool a, bool b) { return a|b; }
. It promotes to int
(with movzx
), and uses or
to set flags according to the bitwise OR. This is dumb compared to or dil,sil
/ setne al
.
For bitwise_or
, MSVC does just use an or
instruction (after movzx
on each input), but anyway doesn't re-booleanize.
Missed optimizations in current gcc/clang:
Only ICC/MSVC were making dumb code with the simple function above, but this function still gives gcc and clang trouble:
int select(bool a, bool b, int x, int y) {
return (a&&b) ? x : y;
}
<a href="https://gcc.godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAViAIzkDO6ArqatiAOQCkATAGYCAO1QAbVjgDUvQQGEmRTACN06cQDokc3NP0B6A9IBiZafN4AGAILWbajdPHpgBVAENxAfQ8jMEI7i0h7k0kHhAJSyAOwAQtKk2ETsIiECAGyZKnIJvDEAIvYRLm6ePmSB6sGh4dVRsQlJKaRpHvmWMfI5gnmFxfUqxADuBEzY3pURtREq0flNyanpXT19Rbb2okTS4%2BLYqERVTjODYdvSAB7nIjsAnvPx9vr6zcsQ7fxZX3OygiZXaQgaR3XL2fIbOy2C57A5EfjHGphWY3HbXaQXB6NZ4vaYqP4FdJfbJg2wvRJLVrSD6/OQAy5AkGkuz9TbQ267bD7Q7eIZEPwBabIs4YjnozGPOI416UtIfbLzf6A4Gg3rg1lQzVGCwGSz8KX6xKsfZMCkAR1YBCSFJaIlEwGkVmk5noWw5QW8JG820RIUlNuW7V6sUhAycqCQHlInvQvOqEAjUb9jQDVKD63Btm1wFQqGcrncXnEd2kGBEADdsKQiKaSKKdgLwh5UABrMIASWkRCQohbmns2owxswzoADkQCABbAgAL2wXfQK264VYO0w6GwTBEYG4RH7tmmmEF9VOTl%2B5JeC1TbQEcTWIfs3Ei5HEPAArNxyCIeFZP%2BgeHqBoJCw7CcLIQj0J%2BRA/k%2Bz4tiAghvpoAAsghWPQMT8AAHDEACcb78Mhb5YchL48Mhn6TiAb4xChMQxPQ9BWFhWFvlYRExFhX4weQ/7cJ%2BTAgFY5DQdwv7PnAsAoBgk6jgQ%2BwUFQCboLJ8lVmg4j0Lh3gZKRABm8lEFWgmBDxQwiFGdw8JB5AyZO2C3AA8iIxY8Tgk5%2BMA%2BxuVacIEJWgliZ%2B2CXAcq5cNwNnbFyPFEKQU7WU%2BL4ECogmQM%2B6DjgQ6AiIFAC0jmXAJbAcFw9DPq%2B3AftxQW8TwlxYRkeW6dIACyADKABq8jSFp0j8OhMTSAASgAKgA6tS%2BDEOYAiCIwFgqXJCngfNkRQTBkTPkg2AeDgpDUBV5GUSAyH8JoBHYVhuGMch%2BFnfwNW/nV/HMEJImbXBCFIfRVgQRk/Bvhkb70BkMRvmR3CCJ%2B361XxG1BRJiAQNJS1qYp1Ayct6nAFuHijkwSDoEQ5AGeIRmkCZKhmaIlmJbZKn2U5LlWZFn7uZ53m1YQSSHP5m48SFYVGfT0WVc9cUJWzFUpWlEAZVlOX5eNRbSAV0PMCVnAM