Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
386 views
in Technique[技术] by (71.8m points)

performance of N-dimensional nested loops in Python using jit

I want to do a nested loop of the type:

for i0 in range(n):
    for i1 in range(n):
        ....
            for iN in range(n):
                #something

However, I want to keep the number of nested loops N a variable. My current implementation looks something like this:

@jit(nopython=True)
def function():
    i = np.empty(N+1,np.int32) # init index array

    for j in range(n**(N+1)):
        i = get_indices(i,j,N+1)
        #something

@jit(nopython=True)
def get_indices(i,j,N):
    for k in range(N):
        i[k] = j % n
        j = j // n
    return i

However, the second implementation is slower as there is an additional computation to be done (in some cases it makes my code run about 40% slower). Is there any way to achieve the speed of the first variant while keeping N a variable?

Edit: "#something" is in my case

    temp_F = 1 + 0*1j
    for kpr in range(N+1):
        temp_F = temp_F * F[kpr,0,i[kpr],i[0]]
        for kprpr in range(1,kpr+1):
            temp_F = temp_F * F[kpr-kprpr+1,1,i[kpr],i[kprpr]]

    temp_G = 1 + 0*1j
    for k in range(N):
        temp_G = temp_G * G[i[k+1],i[k]]

    U[i[N],i[0]] += temp_G * temp_F

Where F and G are given arrays and U is to be filled with sums over i_1,...,i_{N-1} so order does not matter.

Edit2: I have inserted the first answer into my code and found: U_new is my first variant with the explicit nested loops, U is my second variant and itetare is the variant proposed by the first answer

Where U_new is my first variant with the explicit nested loops, U is my second variant and itetare is the variant proposed by the first answer. So from here it seems like the performance of 7.2.1.1H from TAOCP does not match the performance of the simple nested loops. Also: (maybe I should habe mentioned this earlier) parallelization would be of some interest to me. (In the simple nested loops for example one could just use prange instead of range)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If the order is not important we can implement algorithm 7.2.1.1H from TAOCP Volume 4A:

@jit(nopython=True)
def iterate(n, N):
    a = np.zeros(N+1, dtype=np.int32)
    f = np.arange(0, N+2, dtype=np.int32)
    o = np.ones(N+1, dtype=np.int32)

    while True:
        # Do something with a.
        print(a)

        j = f[0]
        f[0] = 0
        if j == N+1: break
        a[j] += o[j]
        if a[j] == 0 or a[j] == n - 1:
            o[j] = -o[j]
            f[j] = f[j+1]
            f[j+1] = j + 1

This steps through all N-tuples, changing only a single element at a time. E.g.:

[0 0 0]
[1 0 0]
[2 0 0]
[2 1 0]
[1 1 0]
[0 1 0]
[0 2 0]
[1 2 0]
[2 2 0]
[2 2 1]
[1 2 1]
[0 2 1]
[0 1 1]
[1 1 1]
[2 1 1]
[2 0 1]
[1 0 1]
[0 0 1]
[0 0 2]
[1 0 2]
[2 0 2]
[2 1 2]
[1 1 2]
[0 1 2]
[0 2 2]
[1 2 2]
[2 2 2]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...