python - Centroid of N-Dimension dataset

Question

Welcome To Ask or Share your Answers For Others

python - Centroid of N-Dimension dataset

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Centroid of N-Dimension dataset

As I am new in python and in programming in general, my teacher gave me some work. Some of it is to work with the MNIST database of handwritten numbers. Each of the numbers is a vector of 728 components. The problem comes when I want to compute the centroid of each class. This is, the mean of every number in each of the 728 dimensions. If I had two dimensions, I know I should do something like

avgx=(x1+x2+x3)/3

and so on... But I don't know how to do it with 728 dimensions. What I have tried is this:

labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
centroid=[]

i=0
count=[]
value=[0]*10
while(i<1):
    j=0
    
    value[i]=0
    
    while j<len(labels):
        
        if(labels[j]==i):
             count[i]=count[i]+1
             value[i]=value[i]+numbers[j]
             
        j=j+1
    
    valud=value[i]
    centroid.append(x/count[i] for x in valud)
    
    i=i+1

But it returns <generator object <genexpr> at 0x000002ADA1818F90> instead of returning a 728 dimension vector, which would be the centroid of number 0, then number 1 and so on...

EDIT: thanks to one answer, I modified the code to this:

centroid=[]
labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
k=0
i=0
#First we need to calculate the centroid    
while(i<10):
    j=0
    x=[]
    while j<len(labels):
        if(labels[j]==i):
             x.append(numbers[j])  
        j=j+1 
    avg=np.array(x)
    centroid.append((avg.mean(axis=0)))
    i=i+1

And it works perfectly, thankyou so much

question from:https://stackoverflow.com/questions/65907634/centroid-of-n-dimension-dataset

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:12:32+0000

You are using numpy arrays so you should take advantage of all it has to offer.

If you have an array of 10 vectors with 728 elements

>>> import numpy as np
>>> a = np.random.random((10,728))
>>> a.shape
(10, 768)

Just take the mean along the first axis.

>>> centroid = a.mean(axis=0)
>>> centroid.shape
(728,)
>>>

You should spend some time with the Absolute Beginners Tutorial and the Tutorials in the Numpy documentation.

Categories

python - Centroid of N-Dimension dataset

python - Centroid of N-Dimension dataset

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags