c - Why does the clang 6.0 compiler optimize by starting indexes at -N and counting to zero, but clang 11.0 starts at 0 and counts to N?

Question

Welcome To Ask or Share your Answers For Others

c - Why does the clang 6.0 compiler optimize by starting indexes at -N and counting to zero, but clang 11.0 starts at 0 and counts to N?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

c - Why does the clang 6.0 compiler optimize by starting indexes at -N and counting to zero, but clang 11.0 starts at 0 and counts to N?

For the code below, clang 6.0 and 11.0 have a subtle difference in their compiled assembly.

#include <stdint.h>

#define SIZE (1L << 16)
    
void test(uint8_t * restrict a,  uint8_t * restrict b) {
  uint64_t i;

  for (i = 0; i < SIZE; i++) {
    a[i] += b[i];
  } 
}

When I compile with arguments -O1 in clang 6.0, I get the following output:

test:                                   # @test
        mov     rax, -65536
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movzx   ecx, byte ptr [rsi + rax + 65536]
        add     byte ptr [rdi + rax + 65536], cl
        add     rax, 1
        jne     .LBB0_1
        ret

Notice that the compiler changes the loop from a '0 to 65536' index to '-65536 to 0'. I thought this was very clever, because it makes use off the fact that add in assembly will set the ZF flag if the result is zero, saving an instruction. Unfortunately when I run the same code with the same arguments in clang 11.0, I get the following code:

test:                                   # @test
        xor     eax, eax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movzx   ecx, byte ptr [rsi + rax]
        add     byte ptr [rdi + rax], cl
        add     rax, 1
        cmp     rax, 65536
        jne     .LBB0_1
        ret

Notice this time, it keeps the '0 to 65536' index, and adds a cmp instruction at the end of each loop. Also, while this is a specific example, this is not unique to the code I wrote. It persists with -O3 and vectorization enabled as well

What gives? Was the original optimization not actually effective? Did processors change to obviate the trick?

question from:https://stackoverflow.com/questions/65891676/why-does-the-clang-6-0-compiler-optimize-by-starting-indexes-at-n-and-counting

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

c - Why does the clang 6.0 compiler optimize by starting indexes at -N and counting to zero, but clang 11.0 starts at 0 and counts to N?

c - Why does the clang 6.0 compiler optimize by starting indexes at -N and counting to zero, but clang 11.0 starts at 0 and counts to N?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags