floating point - Return a float from a 64-bit assembly function that uses x87 FPU

Question

Welcome To Ask or Share your Answers For Others

floating point - Return a float from a 64-bit assembly function that uses x87 FPU

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

floating point - Return a float from a 64-bit assembly function that uses x87 FPU

I am trying to make a program that calculates equations (what equation doesn't matter currently) that use 64-bit registers, floats, and coprocessor instructions. Unfortunately I don't know how to access the final outcome of the equation as a float. I can do:

fist qword ptr [bla]
mov rax,bla

and change the function type to INT and get my value, but I cannot access it as a FLOAT. Even when I leave the result in ST(0) (the top of the coprocessor stack) it doesn't work as expected and my C++ program gets the wrong result. My assembly code is:

public funct
.data
bla qword ?
bla2 qword 10.0
.code
funct PROC
push rbp
mov rbp, rsp
push rbx

mov bla,rcx
fild qword ptr[bla]

fld qword ptr [bla2]
fmul st(0), st(1)
fist dword ptr [bla]
pop rbx
pop rbp
ret
funct ENDP
END

My C++ code is:

#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>

extern "C" float funct(long long n);
int main(){

    float value1= funct(3);

    return 0;
}

What is the problem, and how can I fix it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:33:58+0000

Your question is a bit ambiguous, and so is your code. I'll present a few ideas using the x87 FPU, and SSE instructions. The usage of x87 FPU instructions is discouraged in 64-bit code, and SSE/SSE2 is preferred. SSE/SSE2 are available on all 64-bit AMD and 64-bit Intel x86 processors.

32-bit float in 64-bit code using x87 FPU

If your question is "How do I write 64-bit assembler code that uses 32-bit floats using the x87 FPU?" then there your C++ code looks fine, but your assembler code needs some work. Your C++ code suggests the output type of the function is a 32-bit float:

extern "C" float funct(long long n);

We need to create a function that returns a 32-bit float. Your assembler code could be modified in the following fashion. I am keeping the stack frame code and the push/pop of RBX in your code, since I assume you were just giving us a minimal example and that your real code is using RBX. With that in mind the following code should work:

public funct
.data
ten REAL4 10.0                     ; Define variable ten as 32-bit (4-byte float)
                                   ; REAL4 and DWORD are both same size. 
                                   ; REAL4 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx

    mov [rbp+16],rcx               ; 32 byte shadow space is just above the return address
                                   ; at RBP+16 (this address is 16 byte aligned). Rather 
                                   ; than use a temporary variable in the data section to 
                                   ; store the value of RCX, we just store it to the 
                                   ; shadow space on the stack.
    fild QWORD ptr[rbp+16]         ; Load and convert 64-bit integer into st(0)
    fld [ten]                      ; st(0) => st(1), st(0) = 10.0
    fmulp                          ; st(1)=st(1)*st(0), st(1) => st(0)
    fstp REAL4 ptr [rbp+16]        ; Store result to shadow space as 32-bit float
    movss xmm0, REAL4 ptr [rbp+16] ; Store single scalar (32-bit float) to xmm0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

I've commented the code, but the thing that might be of interest is that I don't use a second variable in the DATA section. The 64-bit Windows Calling Convention requires the caller of a function to ensure the stack is aligned on a 16-byte boundary and that there is a 32 byte shadow space (AKA register parameter area) allocated before making a call. This area can be used as a scratch area. Since we set up a stack frame, RBP is at RBP+0, the return address is at RBP+8 and the scratch area starts at RBP+16. If you weren't using a stack frame then the return address is at RSP+0, and the shadow space would start at RSP+8 We can store the result of our floating point operation there instead of in the QWORD you labelled bla.

It is a reasonable idea to unwind the floating point stack so nothing remains on it before we exit our function. I use the FPU floating point functions that pop the registers after we are done using them.

The 64-bit Microsoft calling convention requires floating point values to be returned in XMM0. We use the SSE instruction MOVSS to move a scalar single (32-bit float) to the XMM0 register. That is where the C++ code will expect that value to be returned.

32-bit float in 64-bit code using SSE

Building on the ideas in the section above, we can modify the code to use SSE instructions with 32-bit floats. An example of such code is as follows:

public funct
.data
ten REAL4 10.0                     ; Define variable ten as 32-bit (4-byte float)
                                   ; REAL4 and DWORD are both same size. 
                                   ; REAL4 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx
    cvtsi2ss xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar single(float) and store in XMM0
    mulss xmm0, [ten]              ; 32-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.
    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

This code removes the usage of the x87 FPU by using SSE instructions. In particular we use:

    cvtsi2ss xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar single(float) and store in XMM0

CVTSI2SS converts a scalar integer to a scalar single (float). In this case the 64-bit integer value in RCX is converted to a 32-bit float and stored in XMM0. XMM0 is the register we'll be placing our returned value into. XMM0 to XMM5 are considered volatile so we don't need to save their values.

    mulss xmm0, [ten]              ; 32-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

MULSS is an SSE instruction that is used for SSE multiplication using scalar single (float). In this case MULSS would do XMM0=XMM0*(32-bit float memory operand). This would have the effect of doing 32-bit floating point multiply of XMM0 by the 32-bit float of 10.0. Since XMM0 also contains our final result we have nothing more to do but properly exit the function.

64-bit double float in 64-bit code using x87 FPU

This is a variation on the first example, but now we are using 64-bit floats also known as the double type in C++, REAL8 (or QWORD) in assembler, and a scalar double in SSE2. Since we are now using double as the return type we have to modify the C++ code to be:

#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>

extern "C" double funct(long long n);

int main() {    
    double value1 = funct(3);

    return 0;
}

The assembly code would look like:

public funct
.data
ten REAL8 10.0                     ; Define variable ten as 64-bit (8-byte float)
                                   ; REAL8 and QWORD are both same size. 
                                   ; REAL8 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx

    mov [rbp+16],rcx               ; 32 byte shadow space is just above the return address
                                   ; at RBP+8 (this address is 16 byte aligned). Rather 
                                   ; than use a temporary variable in the data section to 
                                   ; store the value of RCX, we just store it to the 
                                   ; shadow space on the stack.
    fild QWORD ptr[rbp+16]         ; Load and convert 64-bit integer into st(0)
    fld [ten]                      ; st(0) => st(1), st(0) = 10.0
    fmulp                          ; st(1)=st(1)*st(0), st(1) => st(0)
    fstp REAL8 ptr [rbp+16]        ; Store result to shadow space as 64-bit float
    movsd xmm0, REAL8 ptr [rbp+16] ; Store double scalar (64-bit float) to xmm0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.

    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

This code is nearly identical to the x87 code using 32-bit float. We are using REAL8 (same as QWORD) to store a 64-bit float and use MOVSD to move a 64-bit double float (scalar double) to XMM0. MOVSD is an SSE2 instruction. It is important to return the proper size float in XMM0. Had you used MOVSS the value returned to the C++ function would likely be incorrect.

64-bit double float in 64-bit code using SSE2

This is a variation on the second example, but now we are using 64-bit floats also known as the double type in C++, REAL8 (or QWORD) in assembler, and a scalar double in SSE2. The C++ code should use the code from the previous section so that double is used instead of float. The assembler code would be similar to this:

public funct
.data
ten REAL8 10.0                     ; Define variable ten as 64-bit (8-byte float)
                                   ; REAL8 and QWORD are both same size. 
                                   ; REAL8 makes for more readable code when using floats
.code
funct PROC
    push rbp
    mov rbp, rsp                   ; Setup stack frame
                                   ; RSP aligned to 16 bytes at this point
    push rbx
    cvtsi2sd xmm0, rcx             ; Convert scalar integer in RCX to 
                                   ;    scalar double(double float) and store in XMM0
    mulsd xmm0, [ten]              ; 64-bit float multiply by 10.0 store in XMM0
                                   ; XMM0 = return value for 32(and 64-bit) floats
                                   ;        in 64-bit code.
    pop rbx
    mov rsp, rbp                   ; Remove stack frame
    pop rbp
    ret
funct ENDP
END

The primary difference from the second example is that we use CVTSI2SD instead of CVTSI2SS. SD in the instruction means we are converting to a scalar double (64-bit double float). Similarly we use the MULSD instruction for multiplication using scalar doubles. XMM0 will hold the 64-bit scalar double (double float) that will be returned to the calling function.

Categories

floating point - Return a float from a 64-bit assembly function that uses x87 FPU

floating point - Return a float from a 64-bit assembly function that uses x87 FPU

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

32-bit float in 64-bit code using x87 FPU

32-bit float in 64-bit code using SSE

64-bit double float in 64-bit code using x87 FPU

64-bit double float in 64-bit code using SSE2

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags