c++ - <快于<=吗？(Is < faster than <=?)

Question

Welcome To Ask or Share your Answers For Others

c++ - <快于<=吗？(Is < faster than <=?)

asked Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - <快于<=吗？(Is < faster than <=?)

I'm reading a book where the author says that if( a < 901 ) is faster than if( a <= 900 ) .

(我正在读一本书，作者说if( a < 901 )比if( a <= 900 )快。)

Not exactly as in this simple example, but there are slight performance changes on loop complex code.

(与这个简单示例不完全一样，但是循环复杂代码的性能稍有变化。)

I suppose this has to do something with generated machine code in case it's even true.

(我想这与生成的机器代码有关，以防万一。)

ask by Vinícius Magalh?es Horta translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-03-06T03:06:51+0000

No, it will not be faster on most architectures.

(不，在大多数架构上它不会更快。)

You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:

(您没有指定，但是在x86上，所有整数比较通常都将在两条机器指令中实现：)

A test or cmp instruction, which sets EFLAGS
(test或cmp指令，用于设置EFLAGS)
And a Jcc (jump) instruction , depending on the comparison type (and code layout):
(还有一条Jcc （跳转）指令，具体取决于比较类型（和代码布局）：)
- jne - Jump if not equal --> ZF = 0
  (jne如果不相等则跳转-> ZF = 0)
- jz - Jump if zero (equal) --> ZF = 1
  (jz如果为零（等于）则跳转-> ZF = 1)
- jg - Jump if greater --> ZF = 0 and SF = OF
  (jg更大时跳转-> ZF = 0 and SF = OF)
- (etc...)
  (（等等...）)

Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c

(示例（为简洁起见，已编辑）与$ gcc -m32 -S -masm=intel test.c一起编译)

    if (a < b) {
        // Do something 1
    }

Compiles to:

(编译为：)

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jge     .L2                          ; jump if a is >= b
    ; Do something 1
.L2:

And

(和)

    if (a <= b) {
        // Do something 2
    }

Compiles to:

(编译为：)

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jg      .L5                          ; jump if a is > b
    ; Do something 2
.L5:

So the only difference between the two is a jg versus a jge instruction.

(因此，两者之间的唯一区别是jg与jge指令。)

The two will take the same amount of time.

(两者将花费相同的时间。)

I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time.

(我想指出的是，没有任何内容表明不同的跳转指令花??费相同的时间。)

This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference , they are all grouped together under one common instruction, Jcc (Jump if condition is met).

(回答这个问题有些棘手，但是我可以提供以下信息：在《英特尔指令集参考》中，它们都按照一条通用指令Jcc组合在一起（如果满足条件则跳转）。)

The same grouping is made together under the Optimization Reference Manual , in Appendix C. Latency and Throughput.

(在“ 优化参考手册 ”的附录C.延迟和吞吐量中将相同的分组在一起。)

Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.
(延迟 -执行内核完成形成指令的所有μop所需的时钟周期数。)

Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again.
(吞吐量（T吞吐率） —在发布端口可以自由再次接受同一指令之前需要等待的时钟周期数。)
For many instructions, the throughput of an instruction can be significantly less than its latency
(对于许多指令，一条指令的吞吐量可以大大小于其延迟)

The values for Jcc are:

(Jcc的值是：)

      Latency   Throughput
Jcc     N/A        0.5

with the following footnote on Jcc :

(以下关于Jcc脚注：)

7) Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches.
(7）有条件跳转指令的选择应基于第3.4.1节“分支预测优化”的建议，以提高分支的可预测性。)
When branches are predicted successfully, the latency of jcc is effectively zero.
(成功预测分支后， jcc的等待时间实际上为零。)

So, nothing in the Intel docs ever treats one Jcc instruction any differently from the others.

(因此，英特尔文档中对Jcc指令的处理方式与其他任何处理方式都没有区别。)

If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS , to determine whether the conditions are met.

(如果考虑用于实现指令的实际电路，则可以假设EFLAGS不同位上将存在简单的AND / OR门，以确定是否满足条件。)

There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)

(这样，没有理由测试一个两位的指令所花费的时间要比一个测试一个位所花费的时间更多或更少（忽略门传播延迟，这比时钟周期要短得多）。)

Edit: Floating Point

(编辑：浮点数)

This holds true for x87 floating point as well: (Pretty much same code as above, but with double instead of int .)

(x87浮点数也是如此：（与上面的代码几乎相同，但是使用double而不是int 。）)

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
        fstp    st(0)
        seta    al                     ; Set al if above (CF=0 and ZF=0).
        test    al, al
        je      .L2
        ; Do something 1
.L2:

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; (same thing as above)
        fstp    st(0)
        setae   al                     ; Set al if above or equal (CF=0).
        test    al, al
        je      .L5
        ; Do something 2
.L5:
        leave
        ret

Categories

c++ - <快于<=吗？(Is < faster than <=?)

c++ - <快于<=吗？(Is < faster than <=?)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags