Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
387 views
in Technique[技术] by (71.8m points)

python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法?(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

I am trying to create a Python function which will take x coordinates and y coordinates as an input and calculate the distances between all of the data points.

(我正在尝试创建一个Python函数,该函数将x坐标和y坐标作为输入并计算所有数据点之间的距离。)

The distances should be stored as a list (or array) and passed back to the calling program.

(距离应存储为列表(或数组),并传递回调用程序。)

The algorithm I am starting with looks like the example below.

(我开始使用的算法如下例所示。)

def distance(x, y):
    dist = []
    for j in range(len(x)):
        for i in range(len(y)):
            """
            Don't calculate the distance between the same point
            since it will obviously be zero
            """
            if j != i:
                mag = (x[j] - x[i]) ** 2.0 + (y[j] - y[i]) ** 2.0
                dist.append(np.sqrt(mag))
    return dist

x_vals = [2.3, 3.6, 1.8]
y_vals = [1.6, 4.8, 2.8]

vals = distance(x_vals, y_vals)
print(vals)

This algorithm will calculate the distance between points 1-2, 1-3, 2-1, 2-3, 3-1, and 3-2, returning the following lists

(此算法将计算点1-2、1-3、2-1、2-3、3-1和3-2之间的距离,并返回以下列表)

[3.4539832078341086, 1.2999999999999996, 3.4539832078341086, 2.6907248094147422, 1.2999999999999996, 2.6907248094147422]

While the results are correct, the algorithm repeats measurements.

(当结果正确时,算法将重复测量。)

As you can see the distance from point 1-2 is the same as 2-1, and the distances between 1-3 is the same as 3-1, as well as 2-3 is the same as 3-2.

(如您所见,到点1-2的距离与2-1相同,并且在1-3之间的距离与3-1相同,而在2-3处与3-2相同。)

In other words, I would like to create a more efficient algorithm that only calculates between 1-2, 1-3, and 2-3.

(换句话说,我想创建一种仅在1-2、1-3和2-3之间进行计算的更有效的算法。)

While this sample only contains 3 data points (ie 3 pairs of x and y coordinates), I would like this algorithm to be applicable to a much larger number of datapoints, and be as efficient as possible since this could be applied to a large number of data points.

(尽管此样本仅包含3个数据点(即3对x和y坐标),但我希望此算法适用于大量数据点,并应尽可能高效,因为它可以应用于大量数据点。数据点。)

  ask by Jon translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This should be faster since it does not use explicit for loops

(这应该更快,因为它不使用显式的for循环)

from itertools import combinations
from math import sqrt

def dist(x_vals, y_vals):
  " Distance of pair combinations of x_vals & y_vals "

  # Distance between zipped pairs
  dist2 = lambda z: sqrt((z[0][0] - z[1][0]) ** 2.0 + (z[0][1]- z[1][1]) ** 2.)

  # Use combinations to create desired distance pairs (i.e. 1-2, 1-3, 2-3, etc.)
  return list(map(dist2, combinations(zip(x_vals, y_vals), 2)))

Test (测试)

x_vals = [2.3, 3.6, 1.8]
y_vals = [1.6, 4.8, 2.8]
print(dist(x_vals, y_vals))
# >> [3.4539832078341086, 1.2999999999999996, 2.69072480941474227422]

Performance Testing (性能测试)

Small Data Test

(小数据测试)

Comparing posted solutions--list comprehension (6502 posting) and current posting (darrlg) based upon map & combinations are the fastest on small datasets.

(比较已发布的解决方案-基于地图和组合的列表理解(6502发布)和当前发布(darrlg)在小型数据集上最快。)

Original Data (Small): 
 x_vals = [2.3, 3.6, 1.8]
 y_vals = [1.6, 4.8, 2.8]
  1. Jon Original posting: 22.5 μs ± 853 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    (乔恩(Jon)原始发布:每个循环22.5 μs ±853 ns(平均±标准偏差,共运行7次,每个循环10000次))

  2. Jon Updated posting: 19.5 μs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    (乔恩(Jon)更新过的发布:每个循环19.5 μs ±700 ns(平均±标准偏差,共运行7次,每个循环100000次))

  3. MTRW posting (scipy unique answers) 18.8 μs ± 929 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    (MTRW投递(科学的独特答案),每个循环18.8 μs ±929 ns(平均±标准偏差,共运行7次,每个循环10000次))

  4. 6502 posting (based upon list comprehension) 5.85 μs ± 239 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    (6502发布(基于列表理解)每个循环5.85 μs ±239 ns(平均±标准偏差,共运行7次,每个循环100000次))

  5. Current solution: 5.74 μs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    (当前解决方案:每个回路5.74 μs ±195 ns(平均值±标准偏差,共运行7次,每个回路100000个))

Larger Data Test (vector length 1000)

(更大的数据测试(向量长度1000))

Result: Scipy is much faster for larger datasets

(结果:Scipy对于较大的数据集要快得多)

Data
N = 1000
x_vals = [random.randrange(N) for _ in range(N)]
y_vals = [random.randrange(N) for _ in range(N)]
  1. Jon Original Code: 3.2 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    (乔恩(Jon)原始代码:每个循环3.2 s ±20毫秒(平均±标准偏差,共运行7次,每个循环1次))

  2. Jon Updated Code: 1.77 s ± 39.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    (乔恩(Jon)更新的代码:每个循环1.77 s ±39.6 ms(平均±标准偏差,共运行7次,每个循环1次))

  3. MTRW posting (scipy unique answers) 7.65 ms ± 80.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    (MTRW发布(科学的独特答案)每个循环7.65毫秒 ±80.1微秒(平均±标准偏差,共运行7次,每个循环100个循环))

  4. 6502 posting (based upon list comprehension) 910 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    (6502发布(基于列表理解)每个循环910毫秒 ±16.5毫秒(平均±标准偏差,共运行7次,每个循环1次))

  5. Current solution: 687 ms ± 7.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    (当前解决方案:每个循环687毫秒 ±7.92毫秒(平均±标准偏差,共运行7次,每个循环1次))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...