why can't I use multiple higher bytes in a register
Every permutation of an instruction needs to be encoded in the instruction. The original 8086 processor supports the following options:
instruction encoding remarks
---------------------------------------------------------
mov ax,value b8 01 00 <-- whole register
mov al,value b4 01 <-- lower byte
mov ah,value b0 01 <-- upper byte
Because the 8086 is a 16 bit processor three different versions cover all options.
In the 80386 32-bit support was added. The designers had a choice, either add support for 3 additional sets of registers (x 8 registers = 24 new registers) and somehow find encodings for these, or leave things mostly as they were before.
Here's what the designers opted for:
instruction encoding remarks
---------------------------------------------------------
mov eax,value b8 01 00 00 00 (same encoding as mov ax,value!)
mov ax,value 66 b8 01 00 (prefix 66 + encoding for mov eax,value)
mov al,value (same as before)
mov ah,value (same as before)
They simply added a 0x66
prefix to change the register size from the (now) default 32 to 16 bit plus a 0x67
prefix to change the memory operand size. And left it at that.
To do otherwise would have meant doubling the number of instruction encodings or add three six new prefixes for each of your 'new' partial registers.
By the time the 80386 came out all instruction bytes were already taken, so there was no space for new prefixes. This opcode space had been eaten up by useless instructions like AAA
, AAD
, AAM
, AAS
, DAA
, DAS
SALC
. (These have been disabled in X64 mode to free up much needed encoding space).
If you want to change only the higher bytes of a register, simply do:
movzx eax,cl //mov al,cl, but faster
shl eax,24 //mov al to high byte.
But why not two (say r8dl and r8dh)
In the original 8086 there were 8 byte sized registers:
al,cl,dl,bl,ah,ch,dh,bh <-- in this order.
The index registers, base pointer and stack reg do not have byte registers.
In the x64 this was changed. If there is a REX
prefix (denoting x64 registers) then al..bh
(8 regs) encode al
..r15l
. 16 regs incl. 1 extra encoding bit from the rex prefix. This adds spl
, dil
, sil
, bpl
, but excludes any xh
reg. (you can still get the four xh
regs when not using a rex
prefix).
And using r8b makes the complete r8 "busy"
Yes, this is called a 'partial register write'. Because writing r8b
changes part, but not all of r8
, r8
is now split into two halves. One half has changed and one half has not. The CPU needs to join the two halves. It can either do this by using an extra CPU cycle to perform the work, or by adding more circuitry to the task to be able to do it in a single cycle.
The latter is expensive in terms of silicon and complex in terms of design, it also adds extra heat because of the extra work being done (more work per cycle = more heat produced). See Why doesn't GCC use partial registers? for a run-down on how different x86 CPUs handle partial-register writes (and later reads of the full register).
if I use r8b I can't access upper 56 bits at the same time, they exist, but unaccessible
No they are not unaccessible
.
mov rax,bignumber //random value in eax
mov al,0 //clear al
xor r8d,r8d //r8=0
mov r8b,16 //set r8b
or r8,rax //change r8 upper without changing r8b
You use masks plus and
, or
, xor
and not and
to change parts of a register without affecting the rest of it.
There really was never a need for ah
, but it did lead to more compact code on 8086 (and effectively more usable registers). It's still sometimes useful to write EAX or RAX and then read AL and AH separately (e.g. movzx ecx, al
/ movzx edx, ah
) as part of unpacking bytes.