Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
239 views
in Technique[技术] by (71.8m points)

python - Centroid of N-Dimension dataset

As I am new in python and in programming in general, my teacher gave me some work. Some of it is to work with the MNIST database of handwritten numbers. Each of the numbers is a vector of 728 components. The problem comes when I want to compute the centroid of each class. This is, the mean of every number in each of the 728 dimensions. If I had two dimensions, I know I should do something like

avgx=(x1+x2+x3)/3

and so on... But I don't know how to do it with 728 dimensions. What I have tried is this:

labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
centroid=[]

i=0
count=[]
value=[0]*10
while(i<1):
    j=0
    
    value[i]=0
    
    while j<len(labels):
        
        if(labels[j]==i):
             count[i]=count[i]+1
             value[i]=value[i]+numbers[j]
             
        j=j+1
    
    valud=value[i]
    centroid.append(x/count[i] for x in valud)
    
    i=i+1

But it returns <generator object <genexpr> at 0x000002ADA1818F90> instead of returning a 728 dimension vector, which would be the centroid of number 0, then number 1 and so on...

EDIT: thanks to one answer, I modified the code to this:

centroid=[]
labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
k=0
i=0
#First we need to calculate the centroid    
while(i<10):
    j=0
    x=[]
    while j<len(labels):
        if(labels[j]==i):
             x.append(numbers[j])  
        j=j+1 
    avg=np.array(x)
    centroid.append((avg.mean(axis=0)))
    i=i+1

And it works perfectly, thankyou so much

question from:https://stackoverflow.com/questions/65907634/centroid-of-n-dimension-dataset

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are using numpy arrays so you should take advantage of all it has to offer.

If you have an array of 10 vectors with 728 elements

>>> import numpy as np
>>> a = np.random.random((10,728))
>>> a.shape
(10, 768)

Just take the mean along the first axis.

>>> centroid = a.mean(axis=0)
>>> centroid.shape
(728,)
>>>

You should spend some time with the Absolute Beginners Tutorial and the Tutorials in the Numpy documentation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...