Compiling the following code in MSVC 2013, 64-bit release build, /O2
optimization:
while (*s == ' ' || *s == ',' || *s == '
' || *s == '
') {
++s;
}
I get the following code - which has a really cool optimization using a 64-bit register as a lookup table with the bt
(bit test) instruction.
mov rcx, 17596481020928 ; 0000100100002400H
npad 5
$LL82@myFunc:
movzx eax, BYTE PTR [rsi]
cmp al, 44 ; 0000002cH
ja SHORT $LN81@myFunc
movsx rax, al
bt rcx, rax
jae SHORT $LN81@myFunc
inc rsi
jmp SHORT $LL82@myFunc
$LN81@myFunc:
; code after loop...
But my question is: what is the purpose of the movsx rax, al
after the first branch?
First we load a byte from the string into rax
and zero-extend it:
movzx eax, BYTE PTR [rsi]
Then the cmp
/ja
pair performs an unsigned comparison between al
and 44
, and branches forwards if al
is greater.
So now, we know 0 <= al <= 44
in unsigned numbers. Therefore, the highest bit of al
could not possibly be set!
Nonetheless, the next instruction is movsx rax, al
. This is a sign-extended move. But since:
al
is the lowest byte of rax
- we already know the other 7 bytes of
rax
are zeroed
- we just proved that
al
's highest bit could not possibly be set
this movsx
must be a no-op.
Why does MSVC do it? I'm assuming it's not for padding, since in that case another npad
would make the meaning clearer. Is it flushing data dependencies or something?
(By the way, this bt
optimization really makes me happy. Some interesting facts: it runs in 0.6x the time of the 4 cmp
/je
pairs you might expect, it's way faster than strspn
or std::string::find_first_not_of
, and it only happens in 64-bit builds even if the characters of interest have values under 32.)
See Question&Answers more detail:
os