It depends what else you're doing with the numbers. You can trade off a slight loss in space efficiency and a modest loss in efficiency of multiprecision arithmetic in return for very efficient conversion to and from decimal. The key is to do multiprecision arithmetic with a base that is a power of 10 rather than a power of 2.
For example, you might use base 10,000, where you pack one digit into a 16-bit word and you do your arithmetic on digits in 32-bit integers. (If you're on a 64-bit machine you can double that and do base 1,000,000,000.) This kind of code is relatively efficient timewise, although not quite as fast as using the native power of two because you can't take advantage of the carry bit on the hardware.
And you can't represent as many integers in the same number of bits.
But it's a whiz at converting to and from decimal, because you get to convert the individual digits without any long division.
If you need to represent the full range of numbers from zero to ((1 << 128) - 1)
, you can still do this, but add an extra digit, so your numbers will be bigger.
If it turns out you really need the extra space/speed (maybe you're doing a lot of cryptographic 128-bit calculations) then the method of simultanous div/mod by 10 is the fastest method I know. The only other trick is that if small integers are common, you can handle them specially. (That is, if the three most significant 32-bit words are all zero, just use the native division to convert.)
Is there a clever way to implement this function that I'm not seeing?
Dave Hanson's C Interfaces and Implementations has a lengthy chapter on multiprecision arithmetic. Dividing a large number by a single digit is a special case that has this efficient implementation:
int XP_quotient(int n, T z, T x, int y) {
int i;
unsigned carry = 0;
for (i = n - 1; i >= 0; i--) {
carry = carry*BASE + x[i];
z[i] = carry/y;
carry %= y;
}
return carry;
}
For full understanding, it really helps to have the book, but the source code is still a lot easier to understand than the GNU source code. And you could easily adapt it to use base 10,000 (it currently uses base 256).
Summary: if your performance bottleneck is conversion to decimal, implement multiprecision arithmetic with a base that is a power of 10. If your machine's native word size is 32 and you are using C code, use 10,000 in a 16-bit word.