I just installed python and Im really a new beginner to it. My first task was to build a chart on Jupiter lab using iris data set. The below is the code I use to cluster it under python Jupiter notebook
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
In [2]:
iris_frame=pd.read_csv("Iris.csv")
iris_frame.head()
In [3]:
x=iris_frame.drop(columns=["Species","Id"] ,axis=1)
y=iris_frame.Species
from sklearn.preprocessing import LabelEncoder
encode=LabelEncoder()
y=encode.fit_transform(y)
y
In [4]:
model=KMeans(n_clusters=3,random_state=1)
y_pred=model.fit_predict(x)
x=x.values
In [5]:
# Visualising the clusters - On the last two columns(petal length, width)
plt.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1],
s = 100, c = 'magenta', label = 'Iris-setosa')
plt.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1],
s = 100, c = 'blue', label = 'Iris-versicolour')
plt.scatter(x[y_pred == 2, 0], x[y_pred == 2, 1],
s = 100, c = 'green', label = 'Iris-virginica')
# Plotting the centroids of the clusters
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:,1],
s = 100, c = 'black', label = 'Centroids')
plt.legend()
plt.show()
There are 6 columns in the data set as Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm and Species.
But as per the below code, the code only accepting two columns as X and Y axis right? such as
In [3]:
x=iris_frame.drop(columns=["Species","Id"] ,axis=1)
y=iris_frame.Species
from sklearn.preprocessing import LabelEncoder
encode=LabelEncoder()
y=encode.fit_transform(y)
y
If it is yes, why re they don't use other columns in clustering. Because to cluster accurately, it has to use all data in all columns right? need some explanation.
PS. I know nothing about python. this is my first day.. :)
this is the link I have used to construct it
https://github.com/MeghanaKankanala/TSF/blob/main/Iris_clustering.ipynb
Thank you very much