I want to do a nested loop of the type:
for i0 in range(n):
for i1 in range(n):
....
for iN in range(n):
#something
However, I want to keep the number of nested loops N a variable. My current implementation looks something like this:
@jit(nopython=True)
def function():
i = np.empty(N+1,np.int32) # init index array
for j in range(n**(N+1)):
i = get_indices(i,j,N+1)
#something
@jit(nopython=True)
def get_indices(i,j,N):
for k in range(N):
i[k] = j % n
j = j // n
return i
However, the second implementation is slower as there is an additional computation to be done (in some cases it makes my code run about 40% slower). Is there any way to achieve the speed of the first variant while keeping N a variable?
Edit:
"#something" is in my case
temp_F = 1 + 0*1j
for kpr in range(N+1):
temp_F = temp_F * F[kpr,0,i[kpr],i[0]]
for kprpr in range(1,kpr+1):
temp_F = temp_F * F[kpr-kprpr+1,1,i[kpr],i[kprpr]]
temp_G = 1 + 0*1j
for k in range(N):
temp_G = temp_G * G[i[k+1],i[k]]
U[i[N],i[0]] += temp_G * temp_F
Where F and G are given arrays and U is to be filled with sums over i_1,...,i_{N-1} so order does not matter.
Edit2:
I have inserted the first answer into my code and found:
Where U_new is my first variant with the explicit nested loops, U is my second variant and itetare is the variant proposed by the first answer.
So from here it seems like the performance of 7.2.1.1H from TAOCP does not match the performance of the simple nested loops.
Also: (maybe I should habe mentioned this earlier) parallelization would be of some interest to me. (In the simple nested loops for example one could just use prange instead of range)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…