You clobber memory but don't tell GCC about it, so GCC can cache values in buf
across assembly calls. If you want to use inputs and outputs, tell GCC about everything.
__asm__ (
"movq %1, 0(%0)
"
"movq %2, 8(%0)"
: /* Outputs (none) */
: "r"(buf), "r"(rrax), "r"(rrbx) /* Inputs */
: "memory"); /* Clobbered */
You also generally want to let GCC handle most of the mov
, register selection, etc -- even if you explicitly constrain the registers (rrax is stil %rax
) let the information flow through GCC or you will get unexpected results.
__volatile__
is wrong.
The reason __volatile__
exists is so you can guarantee that the compiler places your code exactly where it is... which is a completely unnecessary guarantee for this code. It's necessary for implementing advanced features such as memory barriers, but almost completely worthless if you are only modifying memory and registers.
GCC already knows that it can't move this assembly after printf
because the printf
call accesses buf
, and buf
could be clobbered by the assembly. GCC already knows that it can't move the assembly before rrax=0x39;
because rax
is an input to the assembly code. So what does __volatile__
get you? Nothing.
If your code does not work without __volatile__
then there is an error in the code which should be fixed instead of just adding __volatile__
and hoping that makes everything better. The __volatile__
keyword is not magic and should not be treated as such.
Alternative fix:
Is __volatile__
necessary for your original code? No. Just mark the inputs and clobber values correctly.
/* The "S" constraint means %rsi, "b" means %rbx, and "a" means %rax
The inputs and clobbered values are specified. There is no output
so that section is blank. */
rsi = (long) buf;
__asm__ ("movq %%rax, 0(%%rsi)" : : "a"(rrax), "S"(rssi) : "memory");
__asm__ ("movq %%rbx, 0(%%rsi)" : : "b"(rrbx), "S"(rrsi) : "memory");
Why __volatile__
doesn't help you here:
rrax = 0x34; /* Dead code */
GCC is well within its rights to completely delete the above line, since the code in the question above claims that it never uses rrax
.
A clearer example
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ __volatile__ ("movq %%rax, (global)");
}
The disassembly is more or less as you expect it at -O0
,
movl $5, %rax
movq %rax, (global)
But with optimization off, you can be fairly sloppy about assembly. Let's try -O2
:
movq %rax, (global)
Whoops! Where did rax = 5;
go? It's dead code, since %rax
is never used in the function — at least as far as GCC knows. GCC doesn't peek inside assembly. What happens when we remove __volatile__
?
; empty
Well, you might think __volatile__
is doing you a service by keeping GCC from discarding your precious assembly, but it's just masking the fact that GCC thinks your assembly isn't doing anything. GCC thinks your assembly takes no inputs, produces no outputs, and clobbers no memory. You had better straighten it out:
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ __volatile__ ("movq %%rax, (global)" : : : "memory");
}
Now we get the following output:
movq %rax, (global)
Better. But if you tell GCC about the inputs, it will make sure that %rax
is properly initialized first:
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ ("movq %%rax, (global)" : : "a"(rax) : "memory");
}
The output, with optimizations:
movl $5, %eax
movq %rax, (global)
Correct! And we don't even need to use __volatile__
.
Why does __volatile__
exist?
The primary correct use for __volatile__
is if your assembly code does something else besides input, output, or clobbering memory. Perhaps it messes with special registers which GCC doesn't know about, or affects IO. You see it a lot in the Linux kernel, but it's misused very often in user space.
The __volatile__
keyword is very tempting because we C programmers often like to think we're almost programming in assembly language already. We're not. C compilers do a lot of data flow analysis — so you need to explain the data flow to the compiler for your assembly code. That way, the compiler can safely manipulate your chunk of assembly just like it manipulates the assembly that it generates.
If you find yourself using __volatile__
a lot, as an alternative you could write an entire function or module in an assembly file.