python - How to find most frequent values in numpy ndarray?

Question

Welcome To Ask or Share your Answers For Others

python - How to find most frequent values in numpy ndarray?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to find most frequent values in numpy ndarray?

I have a numpy ndarray with shape of (30,480,640), the 1th and 2th axis representing locations(latitude and longitute), the 0th axis contains actual data points.I want to use the most frequent value along the 0th axis at each location, which is to construct a new array with shape of (1,480,640).ie:

>>> data
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[40, 40, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

(perform calculation)

>>> new_data 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])

The data points will contain negtive and positive floating numbers. How can I perform such calculations? Thanks a lot!

I tried with numpy.unique,but I got "TypeError: unique() got an unexpected keyword argument 'return_inverse'".I'm using numpy version 1.2.1 installed on Unix and it doesn't support return_inverse..I also tried mode,but it takes forever to process such large amount of data...so is there an alternative way to get the most frequent values? Thanks again.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:32:16+0000

To find the most frequent value of a flat array, use unique, bincount and argmax:

arr = np.array([5, 4, -2, 1, -2, 0, 4, 4, -6, -1])
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.bincount(indices))]

To work with a multidimensional array, we don't need to worry about unique, but we do need to use apply_along_axis on bincount:

arr = np.array([[5, 4, -2, 1, -2, 0, 4, 4, -6, -1],
                [0, 1,  2, 2,  3, 4, 5, 6,  7,  8]])
axis = 1
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]

With your data:

data = np.array([
   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[ 0,  1,  2,  3,  4],
    [ 5,  6,  7,  8,  9],
    [10, 11, 12, 13, 14],
    [15, 16, 17, 18, 19]],

   [[40, 40, 42, 43, 44],
    [45, 46, 47, 48, 49],
    [50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59]]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
                                None, np.max(indices) + 1), axis=axis)]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

NumPy 1.2, really? You can approximate np.unique(return_inverse=True) reasonably efficiently using np.searchsorted (it's an additional O(n log n), so shouldn't change the performance significantly):

u = np.unique(arr)
indices = np.searchsorted(u, arr.flat)

Categories

python - How to find most frequent values in numpy ndarray?

python - How to find most frequent values in numpy ndarray?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags