Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
526 views
in Technique[技术] by (71.8m points)

c - gcc -O0 still optimizes out "unused" code. Is there a compile flag to change that?

As I brought up in this question, gcc is removing (yes, with -O0) a line of code _mm_div_ss(s1, s2); presumably because the result is not saved. However, this should trigger a floating point exception and raise SIGFPE, which can't happen if the call is removed.

Question: Is there a flag, or multiple flags, to pass to gcc so that code is compiled as-is? I'm thinking something like fno-remove-unused but I'm not seeing anything like that. Ideally this would be a compiler flag instead of having to change my source code, but if that isn't supported is there some gcc attribute/pragma to use instead?

Things I've tried:

$ gcc --help=optimizers | grep -i remove

no results.

$ gcc --help=optimizers | grep -i unused

no results.

And explicitly disabling all dead code/elimination flags -- note that there is no warning about unused code:

$ gcc -O0 -msse2 -Wall -Wextra -pedantic -Winline 
     -fno-dce -fno-dse -fno-tree-dce 
     -fno-tree-dse -fno-tree-fre -fno-compare-elim -fno-gcse  
     -fno-gcse-after-reload -fno-gcse-las -fno-rerun-cse-after-loop 
     -fno-tree-builtin-call-dce -fno-tree-cselim a.c
a.c: In function ‘main’:
a.c:25:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
     __m128 s1, s2;
     ^
$

Source program

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <xmmintrin.h>

static void sigaction_sfpe(int signal, siginfo_t *si, void *arg)
{
    printf("%d,%d,%d
", signal, si!=NULL?1:0, arg!=NULL?1:0);
    printf("inside SIGFPE handler
exit now.
");
    exit(1);
}

int main()
{
    struct sigaction sa;

    memset(&sa, 0, sizeof(sa));
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = sigaction_sfpe;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &sa, NULL);

    _mm_setcsr(0x00001D80);

    __m128 s1, s2;
    s1 = _mm_set_ps(1.0, 1.0, 1.0, 1.0);
    s2 = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
    _mm_div_ss(s1, s2);

    printf("done (no error).
");

    return 0;
}

Compiling the above program gives

$ ./a.out
done (no error).

Changing the line

_mm_div_ss(s1, s2);

to

s2 = _mm_div_ss(s1, s2); // add "s2 = "

produces the expected result:

$ ./a.out
inside SIGFPE handler

Edit with more details.

This appears to be related to the __always_inline__ attribute on the _mm_div_ss definition.

$ cat t.c
int
div(int b)
{
    return 1/b;
}

int main()
{
    div(0);
    return 0;
}


$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
$  

(no warnings or errors)

$ ./t.out
Floating point exception
$

vs below (same except for function attributes)

$ cat t.c
__inline int __attribute__((__always_inline__))
div(int b)
{
    return 1/b;
}

int main()
{
    div(0);
    return 0;
}

$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
$   

(no warnings or errors)

$ ./t.out
$

Adding the function attribute __warn_unused_result__ at least gives a helpful message:

$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
t.c: In function ‘main’:
t.c:9:5: warning: ignoring return value of ‘div’, declared with attribute warn_unused_result [-Wunused-result]
     div(0);
     ^

edit:

Some discussion on the gcc mailing list. Ultimately, I think everything is working as intended.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Why does gcc not emit the specified instruction?

A compiler produces code that must have the observable behavior specified by the Standard. Anything that is not observable can be changed (and optimized) at will, as it does not change the behavior of the program (as specified).

How can you beat it into submission?

The trick is to make the compiler believe that the behavior of the particular piece of code is actually observable.

Since this a problem frequently encountered in micro-benchmark, I advise you to look how (for example) Google-Benchmark addresses this. From benchmark_api.h we get:

template <class Tp>
inline void DoNotOptimize(Tp const& value) {
    asm volatile("" : : "g"(value) : "memory");
}

The details of this syntax are boring, for our purpose we only need to know:

  • "g"(value) tells that value is used as input to the statement
  • "memory" is a compile-time read/write barrier

So, we can change the code to:

asm volatile("" : : : "memory");

__m128 result = _mm_div_ss(s1, s2);

asm volatile("" : : "g"(result) : );

Which:

  • forces the compiler to consider that s1 and s2 may have been modified between their initialization and use
  • forces the compiler to consider that the result of the operation is used

There is no need for any flag, and it should work at any level of optimization (I tested it on https://gcc.godbolt.org/ at -O3).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...