Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

assembly - How to access the carry flag while adding two 64 bit numbers using asm in C

Yeah thanks that works. @PeterCordes. Also __int128 works. But one more thing as you said using the intrinsics of multiprecision arithmetic that is _addcarry_u64 in C, using the header file immintrin.h I have the following code

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <immintrin.h>

unsigned char _addcarry_u64(unsigned char c_in, uint64_t src1, uint64_t src2,uint64_t *sum);

int main()
{
    unsigned char carry;
    uint64_t sum;
    long long int c1=0,c2=0;
    uint64_t a=0x0234BDFA12CD4379,b=0xA8DB4567ACE92B38;
    carry = _addcarry_u64(0,a,b,&sum);
    printf("sum is %lx and carry value is %u n",sum,carry);
    return 0;
}

Can you please point me out the error? I'm getting undefined reference to _addcarry_u64. Some quick google doesn't answer the problem , if any other header file to be used or it is not compatible with gcc and why so

Initially I had this code for adding two 64 bit numbers:

static __inline int is_digit_lessthan_ct(digit_t x, digit_t y)
{ // Is x < y?
    return ( int)((x ^ ((x ^ y) | ((x - y) ^ y))) >> (RADIX-1)); 
}


#define ADDC(carryIn, addend1, addend2, carryOut, sumOut) 
       { digit_t tempReg = (addend1) + (int)(carryIn);    
                (sumOut) = (addend2) + tempReg;           
              (carryOut) = (is_digit_lessthan_ct(tempReg, (int)(carryIn)) | is_digit_lessthan_ct((sumOut), tempReg)); 
 }

Now I got to know that the speed of this implementation can be improved using assembly language. So I am trying to do something similar however I cannot access or return the carry. Here is my code :

#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
    uint64_t d=0,carry=0;
    __asm__("mov %1,%%rax
"
            "adc %2,%%rax
"
            "mov %%rax,%0
"
            :"=r"(d)
            :"r"(a),"r"(b)
            :"%rax"
           );
    return d;
}
int main()
{
    uint64_t a=0xA234BDFA12CD4379,b=0xA8DB4567ACE92B38;
    printf("Sum = %lx 
",add32(a,b));
    return 0;
}

The result of this addition should be 14B100361BFB66EB1, where the initial 1 in msb is the carry. I want to save that carry in another register. I tried jc, but I'm getting some or the other error. Even setc gave me error, may be because I'm not sure of the syntax. So can anyone tell me how to save the carry in another register or return it by modifying this code?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.

There's an Intel intrinsic for adc: _addcarry_u64. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just use unsigned __int128.


Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that carry_out = (x+y) < x, where < is an unsigned compare. For example:

struct long_carry { unsigned long res; unsigned carry; };

struct long_carry add_carryout(unsigned long x, unsigned long y) {
    unsigned long retval = x + y;
    unsigned carry = (retval < x);
    return (struct long_carry){ retval, carry };
}

gcc7.2 -O3 emits this (and clang emits similar code):

    mov     rax, rdi        # because we need return value in a different register
    xor     edx, edx        # set up for setc
    add     rax, rsi        # generate carry
    setc    dl              # save carry.
    ret                     # return with rax=sum, edx=carry  (SysV ABI struct packing)

There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if mov wasn't zero latency, doing the add first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)


Clang will even use adc to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken: carry_out = (x+y) < x doesn't work when there's carry-in. With carry_out = (x+y+c_in) < x, y+c_in can wrap to zero and give you (x+0) < x (false) even though there was carry.

Notice that clang's cmp/adc reg,0 exactly implements the behaviour of the C, which isn't the same as another adc there.

Anyway, gcc doesn't even use adc the first time, when it is safe. (So use unsigned __int128 for code that doesn't suck, and asm for integers even wider than that).

// BROKEN with carry_in=1 and y=~0U
static
unsigned adc_buggy(unsigned long *sum, unsigned long x, unsigned long y, unsigned carry_in) {
    *sum = x + y + carry_in;
    unsigned carry = (*sum < x);
    return carry;
}

// *x += *y
void add256(unsigned long *x, unsigned long *y) {
    unsigned carry;
    carry = adc(x, x[0], y[0], 0);
    carry = adc(x+1, x[1], y[1], carry);
    carry = adc(x+2, x[2], y[2], carry);
    carry = adc(x+3, x[3], y[3], carry);
}

    mov     rax, qword ptr [rsi]
    add     rax, qword ptr [rdi]
    mov     qword ptr [rdi], rax

    mov     rax, qword ptr [rdi + 8]
    mov     r8, qword ptr [rdi + 16]   # hoisted
    mov     rdx, qword ptr [rsi + 8]
    adc     rdx, rax                   # ok, no memory operand but still adc
    mov     qword ptr [rdi + 8], rdx

    mov     rcx, qword ptr [rsi + 16]   # r8 was loaded earlier
    add     rcx, r8
    cmp     rdx, rax                    # manually check the previous result for carry.  /facepalm
    adc     rcx, 0

    ...

This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...