Working with uint128
? If you can, use the x86 SSE instructions, which were designed for exactly that. (Then, when you've bitshifted your value, you're ready to do other 128-bit operations...)
SSE2 bit shifts take ~4 instructions on average, with one branch (a case statement). No issues with shifting more than 32 bits, either. The full code for doing this is, using gcc intrinsics rather than raw assembler, is in sseutil.c
(github: "Unusual uses of SSE2") -- and it's a bit bigger than makes sense to paste here.
The hurdle for many people in using SSE2 is that shift ops take immediate (constant) shift counts. You can solve that with a bit of C preprocessor twiddling (wordpress: C preprocessor tricks). After that, you have op sequences like:
LeftShift(uint128 x, int n) = _mm_slli_epi64(_mm_slli_si128(x, n/8), n%8)
for n = 65..71, 73..79, … 121..127
... doing the whole shift in two instructions.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…