Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
215 views
in Technique[技术] by (71.8m points)

python - Calculate mean() of Nympy 2D-array grouped by values in a separate list with strings corresponding to each row in the 2D array

I'm attending a course on Data Analysis with Python (Numpy, Pandas etc).

We have an assignment where we are supposed to calculate mean() of an array - based on values of another list. This might seem a bit unclear so here's an example:

list = ['A','A','A','A','B','B','B','B']
array = [ [5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2] ]

The list-values corresponds to categories for the rows in the array and we are asked to calculate the mean of each column grouped by A and B. I suppose this could be done by converting the data into a Pandas dataframe - but the assignment pertains to Numpy so i suppose we are somehow supposed to solve it without Pandas.

I have struggled and googled to no avail. Any help is much appreciated.

Thanks

B.R. Anders

question from:https://stackoverflow.com/questions/65952132/calculate-mean-of-nympy-2d-array-grouped-by-values-in-a-separate-list-with-str

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The quickest way I can think of is to split the rows and compute the mean. However, this approach is a quick cheat and falls short if you want to generalize your solution to different forms for list:

>>> [x.mean() for x in np.split(np.array(array), 2)]
[2.40625, 2.58750]

A more appropriate solution is to prepare a dictionary of categories. Then sequentially append the rows to the correct entry in the map. I have renamed list to keys.

>>> res = {k: [] for k in set(keys)}
{'A': [], 'B': []}

>>> for k, row in zip(keys, array):
...     res[k] += row

>>> res
{'A': [5.1, 3.5, 1.4, 0.2, 4.9, 3.0, 1.4, 0.2, 4.7, 3.2, 1.3, 0.2, 4.6, 3.1, 1.5, 0.2],
 'B': [5.0, 3.6, 1.4, 0.2, 5.4, 3.9, 1.7, 0.4, 4.6, 3.4, 1.4, 0.3, 5.0, 3.4, 1.5, 0.2]}

Then compute the means:

>>> [(k, sum(v)/len(v)) for k, v in res.items()]
[('B', 2.5875), ('A', 2.40625)]

This will work for any number of categories, and any form of category sequence keys. So long as len(keys) is equal to the number of rows.


I am sure you can come up with a full NumPy alternative yourself.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...