Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
818 views
in Technique[技术] by (71.8m points)

c - Accuracy of floating point arithmetic

I'm having trouble understanding the output of this program

int main()
{
    double x = 1.8939201459282359e-308;
    double y = 4.9406564584124654e-324;
    printf("%23.16e
", 1.6*y);
    printf("%23.16e
", 1.7*y);
    printf("%23.16e
", 1.8*y);
    printf("%23.16e
", 1.9*y);
    printf("%23.16e
", 2.0*y);
    printf("%23.16e
", x + 1.6*y);
    printf("%23.16e
", x + 1.7*y);
    printf("%23.16e
", x + 1.8*y);
    printf("%23.16e
", x + 1.9*y);
    printf("%23.16e
", x + 2.0*y);
}

The output is

9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
9.8813129168249309e-324
1.8939201459282364e-308
1.8939201459282364e-308
1.8939201459282369e-308
1.8939201459282369e-308
1.8939201459282369e-308

I'm using IEEE arithmetic. The variable y holds the smallest possible IEEE number. The first five prints show a number which is twice y as I would expect. What is confusing me is that the next five prints show different numbers. If 1.6*y is the same as 2.0*y then how can x + 1.6*y be different from x + 2.0*y?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In a nutshell

You say that your compiler is Visual C++ 2010 Express. I do not have access to this compiler, but I understand that it generates programs that initially configure the x87 CPU to use 53 bits of precision, in order to emulate IEEE 754 double-precision computations as closely as possible.

Unfortunately, “as closely as possible” is not always close enough. Historical 80-bit floating-point registers can have their significand limited in width for the purpose of emulating double-precision, but they always retain a full range for the exponent. The difference shows in particular when manipulating denormals (like your y).

What happens

My explanation would be that in printf("%23.16e ", 1.6*y);, 1.6*y is computed as a 80-bit reduced-significand full-exponent number (it is thus a normal number), then converted to IEEE 754 double-precision (resulting in a denormal), then printed.

On the other hand, in printf("%23.16e ", x + 1.6*y);, x + 1.6*y is computed with all 80-bit reduced-significand full-exponent numbers (again all intermediate results are normal numbers), then converted to IEEE 754 double-precision, then printed.

This would explain why 1.6*y prints the same as 2.0*y but has a different effect when added to x. The number that is printed is a double-precision denormal. The number that is added to x is a 80-bit reduced-significand full-exponent normal number (not the same one).

What happens with other compilers when generating x87 instructions

Other compilers, like GCC, do not configure the x87 FPU to manipulate 53-bit significands. This can have the same consequences (in this case x + 1.6*y would be computed with all 80-bit full significand full exponent numbers, and then converted to double-precision for printing or storing in memory). In this case, the issue is noticeable even more often (you do not need to involve denormals or infinite numbers to notice differences).

This article by David Monniaux contains all the details you may wish for and more.

Removing the unwanted behavior

To get rid of the problem (if you consider it to be one), find the flag that tells your compiler to generate SSE2 instructions for floating-point. These implement exactly IEEE 754 semantics for single- and double-precision.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...