Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
752 views
in Technique[技术] by (71.8m points)

assembly - imulq and unsigned long long overflow detection in C and asm

As somebody new to assembly, I use gcc for reverse engineering. But now I ran in a somehow funny problem: I try to multiply two 64bit-integer for x86-64. The C - code looks as follows:

unsigned long long 
val(unsigned long long a, unsigned long long b){
    return a*b;
}

and compiled with gcc:

val:
    movq    %rdi, %rax
    imulq   %rsi, %rax
    ret

It might be counterintuitive to use signed multiplication for unsigned integers, but it works for C.

However, I would like to check the multiplication for overflows. Now, the overflow flag is set if the result is greater than 2^63-1 (I guess because it is signed multiplication after all). But for unsigned 64bit this would be still OK as long as the result is not greater than 2^64-1.

What is the right way to do the multiplication (in assembly) in this case?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It looks like you can't use imul without a bunch of extra code, since CF and OF are both set the same way. As the "operation" section of the manual describes, they're set if the full 128b result doesn't match sign_extend(low_half_result). So you're right, even the multi-operand forms of imul still have some signed behaviour. It would be nice if they were like add/sub and set OF and CF independently, so you could look at CF for unsigned data or OF for signed data.

One of the best ways to find a good asm sequence for something is to ask a compiler. C doesn't have convenient integer-overflow detection, but Rust does.

I compiled this function to return the value and the unsigned-wraparound detection bool. Apparently Rust's ABI returns them passing pointer as a hidden first arg, instead of in rdx:rax like I think the C ABI would for such a small struct. :(

pub fn overflowing_mul(a: u64, b: u64) -> (u64, bool) {
  a.overflowing_mul(b)
}
    # frame-pointer boilerplate elided
    mov     rax, rsi
    mul     rdx
    mov     qword ptr [rdi], rax
    seto    byte ptr [rdi + 8]

    mov     rax, rdi                # return the pointer to the return-value
    ret

Asm output from the Godbolt compiler explorer (Rust 1.7.0). This more or less confirms that the mov instruction and the extra uop for a one-operand full multiply is more efficient than anything we could do with extra checks after a two-operand imul.

The documentation for mul says

"The OF and CF flags are set to 0 if the upper half of the result is 0; otherwise, they are set to 1."

So in summary, use mul and check OF or CF to see if the high half is non-zero.


mul vs. imul trivia:

Only the upper half of the full-multiply (N x N => 2N) result differs between imul and mul. I think Intel picked imul as the one that would have multiple explicit operands so that
imul r32, r32, sign-extended-imm8 would make more sense, because sign-extension is probably more useful than zero-extension.

I only just realised that the flag results from imul were signed-only, though. Interesting point.


why does gcc not use mul for unsigned multiplication?

Because one-operand mul/imul are slower (2 uops instead of 1 on Intel CPUs, according to Agner Fog's insn tables. See also the tag wiki). They also use more registers: They require one of their inputs in rax, and produce their outputs in rdx:rax, so extra mov instructions are usually required to move data in/out of those regs.

Thus, imul r64, r64 is a better choice than mul r64, if you don't care about the flag result.

On Intel CPUs imul r64,r64 is actually faster than mul r32. That's not the case on some other CPUs, including AMD Bulldozer-family where 64bit multiplies are somewhat slower. But since mul r32 puts its results into edx:eax instead of just one destination register, they're not direct replacements for each other in most cases anyway.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...