As I am new in python and in programming in general, my teacher gave me some work. Some of it is to work with the MNIST database of handwritten numbers. Each of the numbers is a vector of 728 components.
The problem comes when I want to compute the centroid of each class. This is, the mean of every number in each of the 728 dimensions.
If I had two dimensions, I know I should do something like
avgx=(x1+x2+x3)/3
and so on...
But I don't know how to do it with 728 dimensions. What I have tried is this:
labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
centroid=[]
i=0
count=[]
value=[0]*10
while(i<1):
j=0
value[i]=0
while j<len(labels):
if(labels[j]==i):
count[i]=count[i]+1
value[i]=value[i]+numbers[j]
j=j+1
valud=value[i]
centroid.append(x/count[i] for x in valud)
i=i+1
But it returns <generator object <genexpr> at 0x000002ADA1818F90>
instead of returning a 728 dimension vector, which would be the centroid of number 0, then number 1 and so on...
EDIT: thanks to one answer, I modified the code to this:
centroid=[]
labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
k=0
i=0
#First we need to calculate the centroid
while(i<10):
j=0
x=[]
while j<len(labels):
if(labels[j]==i):
x.append(numbers[j])
j=j+1
avg=np.array(x)
centroid.append((avg.mean(axis=0)))
i=i+1
And it works perfectly, thankyou so much
question from:
https://stackoverflow.com/questions/65907634/centroid-of-n-dimension-dataset 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…