Yes, mov to a register then to memory for immediates that won't fit in a sign-extended 32-bit, unlike -1
aka 0xFFFFFFFFFFFFFFFF
. The why part is interesting question, though:
Remember that asm only lets you do what's possible in machine code. Thus it's really a question about ISA design. Such decisions often involve what's easy for the hardware to decode, as well as encoding efficiency considerations. (Using up opcodes on rarely-used instructions would be bad.)
It's not designed to make things harder, it's designed to not need any new opcodes for mov
. And also to limit 64-bit immediates to one special instruction format. mov
is the only instruction that can ever use a 64-bit immediate at all (or a 64-bit absolute address, for load/store of AL/AX/EAX/RAX).
Check out Intel's manual for the forms of mov
(note that it uses Intel syntax, destination first, and so will my answer.) I also summarized the forms (and their instruction lengths) in Difference between movq and movabsq in x86-64, as did @MargaretBloom in answer to What's the difference between the x86-64 AT&T instructions movq and movabsq?.
Allowing an imm64 along with a ModR/M addressing mode would also make it possible to run into the 15-byte upper limit on instruction length pretty easily, e.g. REX + opcode + imm64 is 10 bytes, and ModRM+SIB+disp32 is 6. So mov [rdi + rax*8 + 1234], imm64
would not be encodeable even if there was an opcode for mov r/m64, imm64
.
And that's assuming they repurposed one of the 1-byte opcodes that were freed up by making some instructions invalid in 64-bit mode (e.g. aaa
), which might be inconvenient for the decoders (and instruction-length pre-decoders) because in other modes those opcodes don't take a ModRM byte or an immediate.
movq
is for the forms of mov
with a normal ModRM byte to allow an arbitrary addressing mode as the destination. (Or as the source for movq r64, r/m64
). AMD chose to keep the immediate for these as 32-bit, same as with 32-bit operand size1.
These forms of mov
are the same instruction format as other instructions like add
. For ease of decoding, this means a REX prefix doesn't change the instruction-length for these opcodes. Instruction-length decoding is already hard enough when the addressing mode is variable-length.
So movq
is 64-bit operand-size but otherwise the same instruction format mov r/m64, imm32
(becoming the sign-extended-immediate form, same as every other instruction which only has one immediate form), and mov r/m64, r64
or mov r64, r/m64
.
movabs
is the 64-bit form of the existing no-ModRM short form mov reg, imm32
. This one is already a special case (because of the no-modrm encoding, with register number from the low 3 bits of the opcode byte). Small positive constants can just use 32-bit operand-size for implicit zero-extension to 64-bit with no loss of efficiency (like 5-byte mov eax, 123
/ AT&T mov $123, %eax
in 32 or 64-bit mode). And having a 64-bit absolute mov
is useful so it makes sense AMD did that.
Since there's no ModRM byte, it can only encode a register destination. It would take a whole different opcode to add a form that could take a memory operand.
From one POV, be grateful you get a mov
with 64-bit immediates at all; RISC ISAs like AArch64 (with fixed-width 32-bit instructions) need more like 4 instructions just to get a 64-bit value into a register. (Unless it's a repeating bit-pattern; AArch64 is actually pretty cool. Unlike earlier RISCs like MIPS64 or PowerPC64)
If AMD64 was going to introduce a new opcode for mov
, mov r/m, sign_extended_imm8
would be vastly more useful to save code-size. It's not at all rare for compilers to emit multiple mov qword ptr [rsp+8], 0
instructions to zero a local array or struct, each one containing a 4-byte 0
immediate. Putting a non-zero small number in a register is fairly common, and would make mov eax, 123
a 3-byte instruction (down from 5), and mov rax, -123
a 4-byte instruction (down from 7). It would also make zeroing a register without clobbering FLAGS 3 bytes.
Allowing mov
imm64 to memory would be useful rarely enough that AMD decided it wasn't worth making the decoders more complex. In this case I agree with them, but AMD was very conservative with adding new opcodes. So many missed opportunities to clean up x86 warts, like widening setcc
would have been nice. But I think AMD wasn't sure AMD64 would catch on, and didn't want to be stuck needing a lot of extra transistors / power to support a feature if people didn't use it.
Footnote 1:
32-bit immediates in general is pretty obviously a good decision for code-size. It's very rare to want to add
an immediate to something that's outside the +-2GiB range. It could be useful for bitwise stuff like AND
, but for setting/clearing/flipping a single bit the bts
/ btr
/ btc
instructions are good (taking a bit-position as an 8-bit immediate, instead of needing a mask). You don't want sub rsp, 1024
to be an 11-byte instruction; 7 is already bad enough.
Giant instructions? Not very efficient
At the time AMD64 was designed (early 2000s), CPUs with uop caches weren't a thing. (Intel P4 with a trace cache did exist, but in hindsight it was regarded as a mistake.) Instruction fetch/decode happens in chunks of up-to-16 bytes, so having one instruction that's nearly 16 bytes isn't much better for the front-end than movabs $imm64, %reg
.
Of course if the back-end isn't keeping up with the front-end, that bubble of only 1 instruction decoded this cycle can be hidden by buffering between stages.
Keeping track of that much data for one instruction would also be a problem. The CPU has to put that data somewhere, and if there's a 64-bit immediate and a 32-bit displacement in the addressing mode, that's a lot of bits. Normally an instruction needs at most 64-bits of space for an imm32 + a disp32.
BTW, there are special no-modrm opcodes for most operations with RAX and an immediate. (x86-64 evolved out of 8086, where AX/AL was more special, see this for more history and explanation). It would have been a plausible design for those add/sub/cmp/and/or/xor/... rax, sign_extended_imm32
forms with no ModRM to instead use a full imm64. The most common case for RAX, immediate uses an 8-bit sign-extended immediate (-128..127), not this form anyway, and it only saves 1 byte for instructions that need a 4-byte immediate. If you do need an 8-byte constant, though, putting it in a register or memory for reuse would be better than doing a 10-byte and-imm64 in a loop, though.