I have a C program which uses GCC's __uint128_t
which is great, but now my needs have grown beyond it.
What are my options for fast arithmetic with 196 or 256 bits?
The only operation I need is addition (and I don't need the carry bit, i.e., I will be working mod 2192 or 2256).
Speed is important, so I don't want to move to a general multi-precision if at all possible. (In fact my code does use multi-precision in some places, but this is in the critical loop and will run tens of billions of times. So far the multi-precision needs to run only tens of thousands of times.)
Maybe this is simple enough to code directly, or maybe I need to find some appropriate library.
What is your advice, Oh great Stack Overflow?
Clarification: GMP is too slow for my needs. Although I actually use multi-precision in my code it's not in the inner loop and runs less than 105 times. The hot loop runs more like 1012 times. When I changed my code (increasing a size parameter) so that the multi-precision part ran more often vs. the single-precision, I had a 100-fold slowdown (mostly due to memory management issues, I think, rather than extra μops). I'd like to get that down to a 4-fold slowdown or better.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…