Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
540 views
in Technique[技术] by (71.8m points)

floating point - Are there any whole numbers which the double cannot represent within the MIN/MAX range of a double?

I realize that whenever one is dealing with IEEE 754 doubles and floats, some numbers can't be represented especially when one tries to represent numbers with lots of digits after the decimal point. This is well understood but I was curious if there were any whole numbers within the MIN/MAX range of a double (or float) that couldn't be represented and thus needed to be rounded to the nearest representable IEEE 754 representation?

For instance very large numbers are sometimes represented in doubles or floats even if they are whole numbers. Clearly using a straight up int64 or some such large integer datatype would be better but people still use doubles for large numbers every so often.

Are there any numbers that can be called out as non-representable or can you give me a mathematical reason why it wouldn't be a problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Sure, there are whole numbers that are not representable as double-precision floating points.

All whole numbers not exceeding Pow(2, 53) or 9007199254740992, are representable. From Pow(2, 53) to Pow(2, 54) (that's 18014398509481984), only even numbers are representable. The odd numbers will be rounded.

Of course it continues like that. From Pow(2, 54) to Pow(2, 55) only the multiples of 4 (those whole numbers which 4 divides) are representable, from Pow(2, 55) to Pow(2, 56) only multiples of 8, and so on.

This is because the double-precision floating-point format has 53 bits (binary digits) for the mantissa (significand).

It is easy to verify my claims. For example, take the number 10000000000000001 as an integer64. Convert it to double and then back to integer64. You will see the precision loss.

When you take very large double-precision numbers, certainly a very little percentage of the whole numbers is representable. For example near 1E+300 (which is between Pow(2, 996) and Pow(2, 997)) we are talking multiples of Pow(2, 944) (1.4870169084777831E+284). This is consistent with the fact that a double is precise up to approximately 16 decimal figures. So a whole number with 300 figures will be "remembered" only by its first approx. 16 figures (actually 53 binary digits).


Addition: The first power of ten that is not exactly representable is 1E+23 (or 100 sextillions, short scale naming style). Near that number, only integral multiples of 16777216 (that is Pow(2, 24)) are representable, but ten to the 23rd power is clearly not a multiple of two to the 24th power. The prime factorization is 10**23 == 2**23 * 5**23, so we can divide evenly by two only 23 times, not 24 times as required.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...