No, it will not be faster on most architectures.
(不,在大多数架构上它不会更快。)
You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions: (您没有指定,但是在x86上,所有整数比较通常都将在两条机器指令中实现:)
Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c
(示例 (为简洁起见,已编辑)与$ gcc -m32 -S -masm=intel test.c
一起编译)
if (a < b) {
// Do something 1
}
Compiles to:
(编译为:)
mov eax, DWORD PTR [esp+24] ; a
cmp eax, DWORD PTR [esp+28] ; b
jge .L2 ; jump if a is >= b
; Do something 1
.L2:
And
(和)
if (a <= b) {
// Do something 2
}
Compiles to:
(编译为:)
mov eax, DWORD PTR [esp+24] ; a
cmp eax, DWORD PTR [esp+28] ; b
jg .L5 ; jump if a is > b
; Do something 2
.L5:
So the only difference between the two is a jg
versus a jge
instruction.
(因此,两者之间的唯一区别是jg
与jge
指令。)
The two will take the same amount of time. (两者将花费相同的时间。)
I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time.
(我想指出的是,没有任何内容表明不同的跳转指令花??费相同的时间。)
This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference , they are all grouped together under one common instruction, Jcc
(Jump if condition is met). (回答这个问题有些棘手,但是我可以提供以下信息:在《 英特尔指令集参考》中 ,它们都按照一条通用指令Jcc
组合在一起(如果满足条件则跳转)。)
The same grouping is made together under the Optimization Reference Manual , in Appendix C. Latency and Throughput. (在“ 优化参考手册 ”的附录C.延迟和吞吐量中将相同的分组在一起。)
Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.
(延迟 -执行内核完成形成指令的所有μop所需的时钟周期数。)
Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again.
(吞吐量(T吞吐率) —在发布端口可以自由再次接受同一指令之前需要等待的时钟周期数。)
For many instructions, the throughput of an instruction can be significantly less than its latency (对于许多指令,一条指令的吞吐量可以大大小于其延迟)
The values for Jcc
are:
(Jcc
的值是:)
Latency Throughput
Jcc N/A 0.5
with the following footnote on Jcc
:
(以下关于Jcc
脚注:)
7) Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches.
(7)有条件跳转指令的选择应基于第3.4.1节“分支预测优化”的建议,以提高分支的可预测性。)
When branches are predicted successfully, the latency of jcc
is effectively zero. (成功预测分支后, jcc
的等待时间实际上为零。)
So, nothing in the Intel docs ever treats one Jcc
instruction any differently from the others.
(因此,英特尔文档中对Jcc
指令的处理方式与其他任何处理方式都没有区别。)
If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS
, to determine whether the conditions are met.
(如果考虑用于实现指令的实际电路,则可以假设EFLAGS
不同位上将存在简单的AND / OR门,以确定是否满足条件。)
There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.) (这样,没有理由测试一个两位的指令所花费的时间要比一个测试一个位所花费的时间更多或更少(忽略门传播延迟,这比时钟周期要短得多)。)
Edit: Floating Point
(编辑:浮点数)
This holds true for x87 floating point as well: (Pretty much same code as above, but with double
instead of int
.)
(x87浮点数也是如此:(与上面的代码几乎相同,但是使用double
而不是int
。))
fld QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
fstp st(0)
seta al ; Set al if above (CF=0 and ZF=0).
test al, al
je .L2
; Do something 1
.L2:
fld QWORD PTR [esp+32]
fld QWORD PTR [esp+40]
fucomip st, st(1) ; (same thing as above)
fstp st(0)
setae al ; Set al if above or equal (CF=0).
test al, al
je .L5
; Do something 2
.L5:
leave
ret