The following function is a faithfully-rounded implementation of arctan
on [0, 1]
:
float atan_poly (float a) {
float s = a * a, u = fmaf(a, -a, 0x1.fde90cp-1f);
float r1 = 0x1.74dfb6p-9f;
float r2 = fmaf (r1, u, 0x1.3a1c7cp-8f);
float r3 = fmaf (r2, s, -0x1.7f24b6p-7f);
float r4 = fmaf (r3, u, -0x1.eb3900p-7f);
float r5 = fmaf (r4, s, 0x1.1ab95ap-5f);
float r6 = fmaf (r5, u, 0x1.80e87cp-5f);
float r7 = fmaf (r6, s, -0x1.e71aa4p-4f);
float r8 = fmaf (r7, u, -0x1.b81b44p-3f);
float r9 = r8 * s;
float r10 = fmaf (r9, a, a);
return r10;
}
The following test harness will abort if the function atan_poly
fails to be faithfully-rounded on [1e-16, 1]
and print "success" otherwise:
int checkit(float f) {
double d = atan(f);
float d1 = d, d2 = d;
if (d1 < d) d2 = nextafterf(d1, 1.0/0.0);
else d1 = nextafterf(d1, -1.0/0.0);
float p = atan_poly(f);
if (p != d1 && p != d2) return 0;
return 1;
}
int main() {
for (float f = 1; f > 1e-16; f = nextafterf(f, -1.0/0.0)) {
if (!checkit(f)) abort();
}
printf("success
");
exit(0);
}
The problem with using s
in every multiplication is that the polynomial's coefficients do not decay rapidly. Inputs close to 1 result in lots and lots of cancellation of nearly equal numbers, meaning you're trying to find a set of coefficients so that the accumulated roundoff at the end of the computation closely approximates the residual of arctan
.
The constant 0x1.fde90cp-1f
is a number close to 1 for which (arctan(sqrt(x)) - x) / x^3
is very close to the nearest float. That is, it's a constant that goes into the computation of u
so that the cubic coefficient is almost completely determined. (For this program, the cubic coefficient must be either -0x1.b81b44p-3f
or -0x1.b81b42p-3f
.)
Alternating multiplications by s
and u
has the effect of reducing the effect of roundoff error in ri
upon r{i+2}
by a factor of at most 1/4, since s*u < 1/4
whatever a
is. This gives considerable leeway in choosing the coefficients of fifth order and beyond.
I found the coefficients with the aid of two programs:
- One program plugs in a bunch of test points, writes down a system of linear inequalities, and computes bounds on the coefficients from that system of inequalities. Notice that, given
a
, one can compute the range of r8
that lead to a faithfully-rounded result. To get linear inequalities, I pretended r8
would be computed as a polynomial in the float
s s
and u
in real-number arithmetic; the linear inequalities constrained this real-number r8
to lie in some interval. I used the Parma Polyhedra Library to handle these constraint systems.
- Another program randomly tested sets of coefficients in certain ranges, plugging in first a set of test points and then all
float
s from 1
to 1e-8
in descending order and checking that atan_poly
produces a faithful rounding of atan((double)x)
. If some x
failed, it printed out that x
and why it failed.
To get coefficients, I hacked this first program to fix c3
, work out bounds on r7
for each test point, then get bounds on the higher-order coefficients. Then I hacked it to fix c3
and c5
and get bounds on the higher-order coefficients. I did this until I had all but the three highest-order coefficients, c13
, c15
, and c17
.
I grew the set of test points in the second program until it either stopped printing anything out or printed out "success". I needed surprisingly few test points to reject almost all wrong polynomials---I count 85 test points in the program.
Here I show some of my work selecting the coefficients. In order to get a faithfully-rounded arctan
for my initial set of test points assuming r1
through r8
are evaluated in real arithmetic (and rounded somehow unpleasantly but in a way I can't remember) but r9
and r10
are evaluated in float
arithmetic, I need:
-0x1.b81b456625f15p-3 <= c3 <= -0x1.b81b416e22329p-3
-0x1.e71d48d9c2ca4p-4 <= c5 <= -0x1.e71783472f5d1p-4
0x1.80e063cb210f9p-5 <= c7 <= 0x1.80ed6efa0a369p-5
0x1.1a3925ea0c5a9p-5 <= c9 <= 0x1.1b3783f148ed8p-5
-0x1.ec6032f293143p-7 <= c11 <= -0x1.e928025d508p-7
-0x1.8c06e851e2255p-7 <= c13 <= -0x1.732b2d4677028p-7
0x1.2aff33d629371p-8 <= c15 <= 0x1.41e9bc01ae472p-8
0x1.1e22f3192fd1dp-9 <= c17 <= 0x1.d851520a087c2p-9
Taking c3 = -0x1.b81b44p-3, assuming r8
is also evaluated in float
arithmetic:
-0x1.e71df05b5ad56p-4 <= c5 <= -0x1.e7175823ce2a4p-4
0x1.80df529dd8b18p-5 <= c7 <= 0x1.80f00e8da7f58p-5
0x1.1a283503e1a97p-5 <= c9 <= 0x1.1b5ca5beeeefep-5
-0x1.ed2c7cd87f889p-7 <= c11 <= -0x1.e8c17789776cdp-7
-0x1.90759e6defc62p-7 <= c13 <= -0x1.7045e66924732p-7
0x1.27eb51edf324p-8 <= c15 <= 0x1.47cda0bb1f365p-8
0x1.f6c6b51c50b54p-10 <= c17 <= 0x1.003a00ace9a79p-8
Taking c5 = -0x1.e71aa4p-4, assuming r7
is done in float
arithmetic:
0x1.80e3dcc972cb3p-5 <= c7 <= 0x1.80ed1cf56977fp-5
0x1.1aa005ff6a6f4p-5 <= c9 <= 0x1.1afce9904742p-5
-0x1.ec7cf2464a893p-7 <= c11 <= -0x1.e9d6f7039db61p-7
-0x1.8a2304daefa26p-7 <= c13 <= -0x1.7a2456ddec8b2p-7
0x1.2e7b48f595544p-8 <= c15 <= 0x1.44437896b7049p-8
0x1.396f76c06de2ep-9 <= c17 <= 0x1.e3bedf4ed606dp-9
Taking c7 = 0x1.80e87cp-5, assuming r6
is done in float
arithmetic:
0x1.1aa86d25bb64fp-5 <= c9 <= 0x1.1aca48cd5caabp-5
-0x1.eb6311f6c29dcp-7 <= c11 <= -0x1.eaedb032dfc0cp-7
-0x1.81438f115cbbp-7 <= c13 <= -0x1.7c9a106629f06p-7
0x1.36d433f81a012p-8 <= c15 <= 0x1.3babb57bb55bap-8
0x1.5cb14e1d4247dp-9 <= c17 <= 0x1.84f1151303aedp-9
Taking c9 = 0x1.1ab95ap-5, assuming r5
is done in float
arithmetic:
-0x1.eb51a3b03781dp-7 <= c11 <= -0x1.eb21431536e0dp-7
-0x1.7fcd84700f7cfp-7 <= c13 <= -0x1.7ee38ee4beb65p-7
0x1.390fa00abaaabp-8 <= c15 <= 0x1.3b100a7f5d3cep-8
0x1.6ff147e1fdeb4p-9 <= c17 <= 0x1.7ebfed3ab5f9bp-9
I picked a point close to the middle of the range for c11
and randomly chose c13
, c15
, and c17
.
EDIT: I've now automated this procedure. The following function is also a faithfully-rounded implementation of arctan
on [0, 1]
:
float c5 = 0x1.997a72p-3;
float c7 = -0x1.23176cp-3;
float c9 = 0x1.b523c8p-4;
float c11 = -0x1.358ff8p-4;
float c13 = 0x1.61c5c2p-5;
float c15 = -0x1.0b16e2p-6;
float c17 = 0x1.7b422p-9;
float juffa_poly (float a) {
float s = a * a;
float r1 = c17;
float r2 = fmaf (r1, s, c15);
float r3 = fmaf (r2, s, c13);
float r4 = fmaf (r3, s, c11);
float r5 = fmaf (r4, s, c9);
float r6 = fmaf (r5, s, c7);
float r7 = fmaf (r6, s, c5);
float r8 = fmaf (r7, s, -0x1.5554dap-2f);
float r9 = r8 * s;
float r10 = fmaf (r9, a, a);
return r10;
}
I find it surprising that this code even exists. For coefficients near these, you can get a bound on the distance between r10
and the value of the polynomial evaluated in real arithmetic on the order of a few ulps thanks to the slow convergence of this polynomial when s
is near 1
. I had expected roundoff error to behave in a way that was fundamentally "untamable" simply by means of tweaking coefficients.