Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
201 views
in Technique[技术] by (71.8m points)

python - Filter numpy array to retain only one row for a given value

I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:

  1. Identify coordinate pairs with duplicated x-values.
  2. Keep only the coordinate pair of those duplicates with the highest y-value.

For example, in the following array:

arr = [[1, 4]
       [1, 8]
       [2, 3]
       [4, 6]
       [4, 2]
       [5, 1]
       [5, 2]
       [5, 6]]

I would like the result to be:

arr = [[1, 8]
       [2, 3]
       [4, 6]
       [5, 6]]

Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here's one way based on np.maximum.reduceat -

def grouby_maxY(a):
    b = a[a[:,0].argsort()] # if first col is already sorted, skip this
    grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
    grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
    return np.c_[b[grp_idx,0], grp_maxY]

Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].

Sample run -

In [453]: np.random.seed(0)

In [454]: arr = np.random.randint(0,5,(10,2))

In [455]: arr
Out[455]: 
array([[4, 0],
       [3, 3],
       [3, 1],
       [3, 2],
       [4, 0],
       [0, 4],
       [2, 1],
       [0, 1],
       [1, 0],
       [1, 4]])

In [456]: grouby_maxY(arr)
Out[456]: 
array([[0, 4],
       [1, 4],
       [2, 1],
       [3, 3],
       [4, 0]])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...