The key to understanding inline asm is to understand that each asm statement has two parts:
The text of the actual assembler stuff, in which the compiler will make textual substitutions, but does not understand.
This is the AssemblerTemplate in the documentation (everything up to the first :
in the __asm__()
).
A description of what the assembler stuff does, in terms that the compiler does understand.
This the : OutputOperands : InputOperands : Clobbers
in the documentation.
This must tell the compiler how the assembler fits in with all the code which the compiler is generating around it. The code generation is busy allocating registers to hold values, deciding what order to do things in, moving things out of loops, eliminating unused fragments of code, discarding values it no longer needs, and so on.
The actual assembler is a black box which takes the inputs described here, produces the outputs described and as a side effect may 'clobber' some registers and/or memory. This must be a complete description of what the assembler does... otherwise the compiler-generated asm around your template will clash with it and rely on false assumptions.
Armed with this information the compiler can decide what registers the assembler can use, and you should let it do that.
So, your fragment:
asm volatile(
"movq %0 , %%rax
"
"rol %%rax
"
"rol %%rax
"
:"=r"(X)
:"r"(X)
);
has a few "issues":
- you may have chosen
%rax
for the result on the basis that the asm() is like a function, and may be expected to return a result in %rax
-- but that isn't so.
- you went ahead and used
%rax
, which the compiler may (well) already have allocated to something else... so you are, effectively, 'clobbering' %rax
but you have failed to tell the compiler about it !
- you specified
=r(X)
(OutputOperand) which tells the compiler to expect an output in some register and that output will be the new value of the variable X
. The %0
in the AssemblerTemplate will be replaced by the register selected for the output. Sadly, your assembly treats %0
as the input :-( And the output is, in fact, in %rax
-- which, as above, the compiler is unaware of.
you also specified r(X)
(InputOperand) which tells the compiler to arrange for the current value of the variable X
to be placed in some register for the assembler to use. This would be %1
in the AssemblerTemplate. Sadly, your assembly does not use this input.
Even though the output and input operands both refer to X
, the compiler may not make %0
the same register as %1
. (This allows it to use the asm block as a non-destructive operation that leaves the original value of an input unmodified. If this isn't how your template works, don't write it that way.
- generally you don't need the
volatile
when all the inputs and outputs are properly described by constraints. One of the fine things the compiler will do is discard an asm()
if (all of) the output(s) are not used... volatile
tells the compiler not to do that (and tells it a number of other things... see the manual).
Apart from that, things were wonderful. The following is safe, and avoids a mov
instruction:
asm("rol %0
"
"rol %0
" : "+r"(X));
where "+r"(X)
says that there is one combined input and output register required, taking the old value of X
and returning a new one.
Now, if you don't want to replace X
, then assuming Y
is to be the result, you could:
asm("mov %1, %0
"
"rol %0
"
"rol %0
" : "=r"(Y) : "r"(X));
But it's better to leave it up to the compiler to decide whether it needs to mov
or whether it can just let an input be destroyed.
There are a couple of rules about InputOperands which are worth mentioning:
The assembler must not overwrite any of the InputOperands -- the compiler is tracking which values it has in which registers, and is expecting InputOperands to be preserved.
The compiler expects all InputOperands to be read before any OutputOperand is written. This is important when the compiler knows that a given InputOperand is not used again after the asm()
, and it can therefore allocate the InputOperand's register to an OutputOperand. There is a thing called earlyclobber (=&r(foo)
) to deal with this little wrinkle.
In the above, if you don't in fact use X
again the compiler could allocate %0
and %1
to the same register! But the (redundant) mov
will still be assembled -- remembering that the compiler really doesn't understand the AssemblerTemplate. So, you are in general better off shuffling values around in the C, not the asm()
. See https://gcc.gnu.org/wiki/DontUseInlineAsm and Best practices for circular shift (rotate) operations in C++
So here are four variations on a theme, and the code generated (gcc -O2):
// (1) uses both X and Y in the printf() -- does mov %1, %0 in asm()
void Never_Inline footle(void) Dump of assembler code for function footle:
{ mov $0x492782,%edi # address of format string
unsigned long X, Y ; xor %eax,%eax
mov $0x63,%esi # X = 99
X = 99 ; rol %rsi # 1st asm
__asm__(" rol %0
" rol %rsi
" rol %0
" : "+r"(X) mov %rsi,%rdx # 2nd asm, compiler using it as a copy-and-rotate
) ; rol %rdx
rol %rdx
__asm__(" mov %1, %0
" jmpq 0x4010a0 <printf@plt> # tailcall printf
" rol %0
"
" rol %0
" : "=r"(Y) : "r"(X)
) ;
printf("%lx %lx
", X, Y) ;
}
// (2) uses both X and Y in the printf() -- does Y = X in 'C'
void Never_Inline footle(void) Dump of assembler code for function footle:
{ mov $0x492782,%edi
unsigned long X, Y ; xor %eax,%eax
mov $0x63,%esi
X = 99 ; rol %rsi # 1st asm
__asm__(" rol %0
" rol %rsi
" rol %0
" : "+r"(X) mov %rsi,%rdx # compiler-generated mov
) ; rol %rdx # 2nd asm
rol %rdx
Y = X ; jmpq 0x4010a0 <printf@plt>
__asm__(" rol %0
"
" rol %0
" : "+r"(Y)
) ;
printf("%lx %lx
", X, Y) ;
}
// (3) uses only Y in the printf() -- does mov %1, %0 in asm()
void Never_Inline footle(void) Dump of assembler code for function footle:
{ mov $0x492782,%edi
unsigned long X, Y ; xor %eax,%eax
mov $0x63,%esi
X = 99 ; rol %rsi
__asm__(" rol %0
" rol %rsi
" rol %0
" : "+r"(X) mov %rsi,%rsi # redundant instruction because of mov in the asm template
) ; rol %rsi
rol %rsi
__asm__(" mov %1, %0
" jmpq 0x4010a0 <printf@plt>
" rol %0
"
" rol %0
" : "=r"(Y) : "r"(X)
) ;
printf("%lx
", Y) ;
}
// (4) uses only Y in the printf() -- does Y = X in 'C'
void Never_Inline footle(void) Dump of assembler code for function footle:
{ mov $0x492782,%edi
unsigned long X, Y ; xor %eax,%eax
mov $0x63,%esi
X = 99 ; rol %rsi
__asm__(" rol %0
" rol %rsi
" rol %0
" : "+r"(X) rol %rsi # no wasted mov, compiler picked %0=%1=%rsi
) ; rol %rsi
jmpq 0x4010a0 <printf@plt>
Y = X ;
__asm__(" rol %0
"
" rol %0
" : "+r"(Y)
) ;
printf("%lx
", Y) ;
}
which, hopefully, demonstrates the compiler busily allocating values to registers, tracking which values it needs to hold on to, minimizing register/register moves, and generally being clever.
So the trick is to work with the compiler, understanding that the :
OutputOperands:
InputOperands:
Clobbers is where you are describing what the assembler is doing.