The quickest way I can think of is to split the rows and compute the mean. However, this approach is a quick cheat and falls short if you want to generalize your solution to different forms for list
:
>>> [x.mean() for x in np.split(np.array(array), 2)]
[2.40625, 2.58750]
A more appropriate solution is to prepare a dictionary of categories. Then sequentially append the rows to the correct entry in the map. I have renamed list
to keys
.
>>> res = {k: [] for k in set(keys)}
{'A': [], 'B': []}
>>> for k, row in zip(keys, array):
... res[k] += row
>>> res
{'A': [5.1, 3.5, 1.4, 0.2, 4.9, 3.0, 1.4, 0.2, 4.7, 3.2, 1.3, 0.2, 4.6, 3.1, 1.5, 0.2],
'B': [5.0, 3.6, 1.4, 0.2, 5.4, 3.9, 1.7, 0.4, 4.6, 3.4, 1.4, 0.3, 5.0, 3.4, 1.5, 0.2]}
Then compute the means:
>>> [(k, sum(v)/len(v)) for k, v in res.items()]
[('B', 2.5875), ('A', 2.40625)]
This will work for any number of categories, and any form of category sequence keys
. So long as len(keys)
is equal to the number of rows.
I am sure you can come up with a full NumPy alternative yourself.