Data:
import pandas as pd
data = pd.DataFrame({'classes':[1,1,1,2,2,2,2],'b':[3,4,5,6,7,8,9], 'c':[10,11,12,13,14,15,16]})
My code:
import numpy as np
from sklearn.cross_validation import train_test_split
X = np.array(data[['b','c']])
y = np.array(data['classes'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=4)
Question:
train_test_split will randomly choose test set from all the classes. Is there any way to have the same number of test set for each class? (For example, two data from class 1 and two data from class 2. Note that the total number of each classes are not equal)
Expected result:
y_test
array([1, 2, 2, 1], dtype=int64)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…