Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
164 views
in Technique[技术] by (71.8m points)

c++ - Avoid optimizing away variable with inline asm

I was reading Preventing compiler optimizations while benchmarking that describes how clobber() and escape() from Chandler Carruths talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" affects the compiler.

From reading that, I assumed that if I have an input constraint like "g"(val), then the compiler wouldn't be able to optimize away val. But in g() below, no code is generated. Why?

How can doNotOptimize() be rewritten to ensure code is generated for g()?

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "g"(val) : "memory");
}

void f() {
  char x = 1;
  doNotOptimize(&x);    // x is NOT optimized away
}

void g() {
  char x = 1;
  doNotOptimize(x);     // x is optimized away
}

https://godbolt.org/g/Ndd56K

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What, exactly, would it mean to have code generated for g()? If you were writing it yourself, what code would you write? Seriously, this is a real question. You have to decide what output you're expecting before you can start cajoling it from the compiler.

Anyway, let's look at what you have now. In f(),

void f() {
  char x = 1;
  doNotOptimize(&x);    // x is NOT optimized away
}

you are taking the address of x, which prevents the optimizer from allocating it in a register. It has to be allocated in memory in order for it to have an address.

However, in g(),

void g() {
  char x = 1;
  doNotOptimize(x);     // x is optimized away
}

x is just a local variable and any sane optimizer will allocate that in a register, or in this case as a constant. This is allowed, since you never take its address; you just use its value. So, for example, the compiler might generate code like this:

g():
    mov  al, 1      // store 1 in BYTE-sized register AL
    ...

Or as in this case not generate any code at all, and substitute any use of the variable by it's constant value.

Your doNotOptimize code,

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "g"(val) : "memory");
}

uses the g constraint for the val parameter, which says that it can be stored in either a general-purpose register, memory or as a constant, whichever the optimizer finds most convenient. Since val is a constant, when this call is inlined, the optimizer leaves it as a constant. Your "memory" clobber specifier has no effect, because there is no modification of memory going on here.

So what can we do? Well, we can force the variable x to be allocated in memory, even though it doesn't need to be, by using the m constraint:

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "m"(val) : "memory");
}

void g() {
  char x = 1;
  doNotOptimize(x);
}

Now the compiler can't optimize the store of x away and is forced to emit the following code:

g():
    mov  BYTE PTR [rsp-1], 1
    ret

Note that this is basically the same effect that declaring the x variable volatile would have.

Remember the question I asked at the beginning? Is that the output you wanted?

Or, maybe you want the compiler to emit that immediate-to-register move. If so, the r constraint will work—or any of the x86-specific constraints that allow you to dictate a particular register. This forces the optimizer to allocate the value in a register, even though it doesn't need to be:

g():
    mov     eax, 1
    ret

I cannot, however, see what the point of either of these would be.

If you wanted to craft a microbenchmark that tested the overhead of calling a function with a single const-reference parameter, then a better option would be to ensure that the definition of the function being called is not visible to the optimizer. Then, it can't inline that function and has to arrange for the call to be made, including all necessary setup. This also works well if you're just studying how a compiler might emit that code. (Naturally, you can't use a template function, though. Well, unless you wanted to abuse C++11's extern templates.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...