python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

Question

Welcome To Ask or Share your Answers For Others

python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

I'm attending a course on Data Analysis with Python (Numpy, Pandas etc).

We have an assignment where we are supposed to calculate mean() of an array - based on values of another list. This might seem a bit unclear so here's an example:

list = ['A','A','A','A','B','B','B','B']
array = [ [5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2] ]

The list-values corresponds to categories for the rows in the array and we are asked to calculate the mean of each column grouped by A and B. I suppose this could be done by converting the data into a Pandas dataframe - but the assignment pertains to Numpy so i suppose we are somehow supposed to solve it without Pandas.

I have struggled and googled to no avail. Any help is much appreciated.

Thanks

B.R. Anders

question from:https://stackoverflow.com/questions/65952132/calculate-mean-of-nympy-2d-array-grouped-by-values-in-a-separate-list-with-str

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:02:21+0000

The quickest way I can think of is to split the rows and compute the mean. However, this approach is a quick cheat and falls short if you want to generalize your solution to different forms for list:

>>> [x.mean() for x in np.split(np.array(array), 2)]
[2.40625, 2.58750]

A more appropriate solution is to prepare a dictionary of categories. Then sequentially append the rows to the correct entry in the map. I have renamed list to keys.

>>> res = {k: [] for k in set(keys)}
{'A': [], 'B': []}

>>> for k, row in zip(keys, array):
...     res[k] += row

>>> res
{'A': [5.1, 3.5, 1.4, 0.2, 4.9, 3.0, 1.4, 0.2, 4.7, 3.2, 1.3, 0.2, 4.6, 3.1, 1.5, 0.2],
 'B': [5.0, 3.6, 1.4, 0.2, 5.4, 3.9, 1.7, 0.4, 4.6, 3.4, 1.4, 0.3, 5.0, 3.4, 1.5, 0.2]}

Then compute the means:

>>> [(k, sum(v)/len(v)) for k, v in res.items()]
[('B', 2.5875), ('A', 2.40625)]

This will work for any number of categories, and any form of category sequence keys. So long as len(keys) is equal to the number of rows.

I am sure you can come up with a full NumPy alternative yourself.

Categories

python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags