performance of N-dimensional nested loops in Python using jit

Question

Welcome To Ask or Share your Answers For Others

performance of N-dimensional nested loops in Python using jit

asked Feb 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance of N-dimensional nested loops in Python using jit

I want to do a nested loop of the type:

for i0 in range(n):
    for i1 in range(n):
        ....
            for iN in range(n):
                #something

However, I want to keep the number of nested loops N a variable. My current implementation looks something like this:

@jit(nopython=True)
def function():
    i = np.empty(N+1,np.int32) # init index array

    for j in range(n**(N+1)):
        i = get_indices(i,j,N+1)
        #something

@jit(nopython=True)
def get_indices(i,j,N):
    for k in range(N):
        i[k] = j % n
        j = j // n
    return i

However, the second implementation is slower as there is an additional computation to be done (in some cases it makes my code run about 40% slower). Is there any way to achieve the speed of the first variant while keeping N a variable?

Edit: "#something" is in my case

    temp_F = 1 + 0*1j
    for kpr in range(N+1):
        temp_F = temp_F * F[kpr,0,i[kpr],i[0]]
        for kprpr in range(1,kpr+1):
            temp_F = temp_F * F[kpr-kprpr+1,1,i[kpr],i[kprpr]]

    temp_G = 1 + 0*1j
    for k in range(N):
        temp_G = temp_G * G[i[k+1],i[k]]

    U[i[N],i[0]] += temp_G * temp_F

Where F and G are given arrays and U is to be filled with sums over i_1,...,i_{N-1} so order does not matter.

Edit2: I have inserted the first answer into my code and found:

Where U_new is my first variant with the explicit nested loops, U is my second variant and itetare is the variant proposed by the first answer. So from here it seems like the performance of 7.2.1.1H from TAOCP does not match the performance of the simple nested loops. Also: (maybe I should habe mentioned this earlier) parallelization would be of some interest to me. (In the simple nested loops for example one could just use prange instead of range)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-16T17:34:46+0000

If the order is not important we can implement algorithm 7.2.1.1H from TAOCP Volume 4A:

@jit(nopython=True)
def iterate(n, N):
    a = np.zeros(N+1, dtype=np.int32)
    f = np.arange(0, N+2, dtype=np.int32)
    o = np.ones(N+1, dtype=np.int32)

    while True:
        # Do something with a.
        print(a)

        j = f[0]
        f[0] = 0
        if j == N+1: break
        a[j] += o[j]
        if a[j] == 0 or a[j] == n - 1:
            o[j] = -o[j]
            f[j] = f[j+1]
            f[j+1] = j + 1

This steps through all N-tuples, changing only a single element at a time. E.g.:

[0 0 0]
[1 0 0]
[2 0 0]
[2 1 0]
[1 1 0]
[0 1 0]
[0 2 0]
[1 2 0]
[2 2 0]
[2 2 1]
[1 2 1]
[0 2 1]
[0 1 1]
[1 1 1]
[2 1 1]
[2 0 1]
[1 0 1]
[0 0 1]
[0 0 2]
[1 0 2]
[2 0 2]
[2 1 2]
[1 1 2]
[0 1 2]
[0 2 2]
[1 2 2]
[2 2 2]

Categories

performance of N-dimensional nested loops in Python using jit

performance of N-dimensional nested loops in Python using jit

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags