python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法？(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

Question

Welcome To Ask or Share your Answers For Others

python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法？(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

asked Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法？(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

I am trying to create a Python function which will take x coordinates and y coordinates as an input and calculate the distances between all of the data points.

(我正在尝试创建一个Python函数，该函数将x坐标和y坐标作为输入并计算所有数据点之间的距离。)

The distances should be stored as a list (or array) and passed back to the calling program.

(距离应存储为列表（或数组），并传递回调用程序。)

The algorithm I am starting with looks like the example below.

(我开始使用的算法如下例所示。)

def distance(x, y):
    dist = []
    for j in range(len(x)):
        for i in range(len(y)):
            """
            Don't calculate the distance between the same point
            since it will obviously be zero
            """
            if j != i:
                mag = (x[j] - x[i]) ** 2.0 + (y[j] - y[i]) ** 2.0
                dist.append(np.sqrt(mag))
    return dist

x_vals = [2.3, 3.6, 1.8]
y_vals = [1.6, 4.8, 2.8]

vals = distance(x_vals, y_vals)
print(vals)

This algorithm will calculate the distance between points 1-2, 1-3, 2-1, 2-3, 3-1, and 3-2, returning the following lists

(此算法将计算点1-2、1-3、2-1、2-3、3-1和3-2之间的距离，并返回以下列表)

[3.4539832078341086, 1.2999999999999996, 3.4539832078341086, 2.6907248094147422, 1.2999999999999996, 2.6907248094147422]

While the results are correct, the algorithm repeats measurements.

(当结果正确时，算法将重复测量。)

As you can see the distance from point 1-2 is the same as 2-1, and the distances between 1-3 is the same as 3-1, as well as 2-3 is the same as 3-2.

(如您所见，到点1-2的距离与2-1相同，并且在1-3之间的距离与3-1相同，而在2-3处与3-2相同。)

In other words, I would like to create a more efficient algorithm that only calculates between 1-2, 1-3, and 2-3.

(换句话说，我想创建一种仅在1-2、1-3和2-3之间进行计算的更有效的算法。)

While this sample only contains 3 data points (ie 3 pairs of x and y coordinates), I would like this algorithm to be applicable to a much larger number of datapoints, and be as efficient as possible since this could be applied to a large number of data points.

(尽管此样本仅包含3个数据点（即3对x和y坐标），但我希望此算法适用于大量数据点，并应尽可能高效，因为它可以应用于大量数据点。数据点。)

ask by Jon translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-03-06T04:18:21+0000

This should be faster since it does not use explicit for loops

(这应该更快，因为它不使用显式的for循环)

from itertools import combinations
from math import sqrt

def dist(x_vals, y_vals):
  " Distance of pair combinations of x_vals & y_vals "

  # Distance between zipped pairs
  dist2 = lambda z: sqrt((z[0][0] - z[1][0]) ** 2.0 + (z[0][1]- z[1][1]) ** 2.)

  # Use combinations to create desired distance pairs (i.e. 1-2, 1-3, 2-3, etc.)
  return list(map(dist2, combinations(zip(x_vals, y_vals), 2)))

Test (测试)

x_vals = [2.3, 3.6, 1.8]
y_vals = [1.6, 4.8, 2.8]
print(dist(x_vals, y_vals))
# >> [3.4539832078341086, 1.2999999999999996, 2.69072480941474227422]

Performance Testing (性能测试)

Small Data Test

(小数据测试)

Comparing posted solutions--list comprehension (6502 posting) and current posting (darrlg) based upon map & combinations are the fastest on small datasets.

(比较已发布的解决方案-基于地图和组合的列表理解（6502发布）和当前发布（darrlg）在小型数据集上最快。)

Original Data (Small): 
 x_vals = [2.3, 3.6, 1.8]
 y_vals = [1.6, 4.8, 2.8]

Jon Original posting: 22.5 μs ± 853 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(乔恩（Jon）原始发布：每个循环22.5 μs ±853 ns（平均±标准偏差，共运行7次，每个循环10000次）)
Jon Updated posting: 19.5 μs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
(乔恩（Jon）更新过的发布：每个循环19.5 μs ±700 ns（平均±标准偏差，共运行7次，每个循环100000次）)
MTRW posting (scipy unique answers) 18.8 μs ± 929 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(MTRW投递（科学的独特答案），每个循环18.8 μs ±929 ns（平均±标准偏差，共运行7次，每个循环10000次）)
6502 posting (based upon list comprehension) 5.85 μs ± 239 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
(6502发布（基于列表理解）每个循环5.85 μs ±239 ns（平均±标准偏差，共运行7次，每个循环100000次）)
Current solution: 5.74 μs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
(当前解决方案：每个回路5.74 μs ±195 ns（平均值±标准偏差，共运行7次，每个回路100000个）)

Larger Data Test (vector length 1000)

(更大的数据测试（向量长度1000）)

Result: Scipy is much faster for larger datasets

(结果：Scipy对于较大的数据集要快得多)

Data
N = 1000
x_vals = [random.randrange(N) for _ in range(N)]
y_vals = [random.randrange(N) for _ in range(N)]

Jon Original Code: 3.2 s ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(乔恩（Jon）原始代码：每个循环3.2 s ±20毫秒（平均±标准偏差，共运行7次，每个循环1次）)
Jon Updated Code: 1.77 s ± 39.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(乔恩（Jon）更新的代码：每个循环1.77 s ±39.6 ms（平均±标准偏差，共运行7次，每个循环1次）)
MTRW posting (scipy unique answers) 7.65 ms ± 80.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(MTRW发布（科学的独特答案）每个循环7.65毫秒 ±80.1微秒（平均±标准偏差，共运行7次，每个循环100个循环）)
6502 posting (based upon list comprehension) 910 ms ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(6502发布（基于列表理解）每个循环910毫秒 ±16.5毫秒（平均±标准偏差，共运行7次，每个循环1次）)
Current solution: 687 ms ± 7.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(当前解决方案：每个循环687毫秒 ±7.92毫秒（平均±标准偏差，共运行7次，每个循环1次）)

Categories

python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法？(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

python - 是否存在一种无需重复计算即可找到多个2d点之间距离的有效算法？(Is there an efficient algorithm for finding the distance between multiple 2d points without repeating a calculation?)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Test (测试)

Performance Testing (性能测试)

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags