You ask about C++, but the specifics of floating-point values and encodings are determined by a floating-point specification, notably IEEE 754, and not by C++. IEEE 754 is by far the most widely used floating-point specification, and I will answer using it.
In IEEE 754, binary floating-point values are encoded with three parts: A sign bit s (0 for positive, 1 for negative), a biased exponent e (the represented exponent plus a fixed offset), and a significand field f (the fraction portion). For normal numbers, these represent exactly the number (?1)s ? 2e?bias ? 1.f, where 1.f is the binary numeral formed by writing the significand bits after “1.”. (For example, if the significand field has the ten bits 0010111011, it represents the significand 1.00101110112, which is 1.182617175 or 1211/1024.)
The bias depends on the floating-point format. For 64-bit IEEE 754 binary, the exponent field has 11 bits, and the bias is 1023. When the actual exponent is 0, the encoded exponent field is 1023. Actual exponents of ?2, ?1, 0, 1, and 2 have encoded exponents of 1021, 1022, 1023, 1024, and 1025. When somebody speaks of the exponent of a subnormal number being zero they mean the encoded exponent is zero. The actual exponent would be less than ?1022. For 64-bit, the normal exponent interval is ?1022 to 1023 (encoded values 1 to 2046). When the exponent moves outside this interval, special things happen.
Above this interval, floating-point stops representing finite numbers. An encoded exponent of 2047 (all 1 bits) represents infinity (with the significand field set to zero). Below this range, floating-point changes to subnormal numbers. When the encoded exponent is zero, the significand field represents 0.f instead of 1.f.
There is an important reason for this. If the lowest exponent value were just another normal encoding, then the lower bits of its significand would be too small to represent as a floating-point values by themselves. Without that leading “1.”, there would be no way to say where the first 1 bit was. For example, suppose you had two numbers, both with the lowest exponent, and with significands 1.00101110112 and 1.00000000002. When you subtract the significands, the result is .00101110112. Unfortunately, there is no way to represent this as a normal number. Because you were already at the lowest exponent, you cannot represent the lower exponent that is needed to say where the first 1 is in this result. Since the mathematical result is too small to be represented, a computer would be forced to return the nearest representable number, which would be zero.
This creates the undesirable property in the floating-point system that you can have a != b
but a-b == 0
. To avoid that, subnormal numbers are used. By using subnormal numbers, we have a special interval where the actual exponent does not decrease, and we can perform arithmetic without creating numbers too small to represent. When the encoded exponent is zero, the actual exponent is the same as when the encoded exponent is one, but the value of the significand changes to 0.f instead of 1.f. When we do this, a != b
guarantees that the computed value of a-b
is not zero.
Here are the combinations of values in the encodings of 64-bit IEEE 754 binary floating-point:
Sign |
Exponent (e) |
Significand Bits (f) |
Meaning |
0 |
0 |
0 |
+zero |
0 |
0 |
Non-zero |
+2?1022?0.f (subnormal) |
0 |
1 to 2046 |
Anything |
+2e?1023?1.f (normal) |
0 |
2047 |
0 |
+infinity |
0 |
2047 |
Non-zero but high bit off |
+, signaling NaN |
0 |
2047 |
High bit on |
+, quiet NaN |
1 |
0 |
0 |
?zero |
1 |
0 |
Non-zero |
?2?1022?0.f (subnormal) |
1 |
1 to 2046 |
Anything |
?2e?1023?1.f (normal) |
1 |
2047 |
0 |
?infinity |
1 |
2047 |
Non-zero but high bit off |
?, signaling NaN |
1 |
2047 |
High bit on |
?, quiet NaN |