python - Split dataset containing multiple labels

Question

Welcome To Ask or Share your Answers For Others

python - Split dataset containing multiple labels

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Split dataset containing multiple labels

I have a dataset with multiple labels, ie for each X I have 2 y and I need to split into train and test set.

I tried with the sklearn function train_test_split():

import numpy as np
from sklearn.model_selection import train_test_split

X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)

X_train, X_test, [Y1_train, Y2_train], [Y1_test, Y2_test] = train_test_split(X, [y1, y2], test_size=0.4, random_state=42)

But I get an error message:

ValueError: Found input variables with inconsistent numbers of samples: [10, 2]

question from:https://stackoverflow.com/questions/66056596/split-dataset-containing-multiple-labels

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:09:24+0000

This code should work for you.

import numpy as np
from sklearn.model_selection import train_test_split

X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)
y = [[y1[i],y2[i]] for i in range(len(y1))] 

X_train, X_test, Y_train, Y_test  = train_test_split(X, y, test_size=0.4, random_state=42)

It will produce the following Output

print(X_train)
[ 0.42534237  1.35471168  0.00640736  1.34057234  0.50608562 -1.73341641]

and

print(Y_train)
[[3, 1], [7, 1], [6, 2], [4, 2], [6, 2], [2, 2]]

In your code your label array has the shape (2,10) but the input array has the shape (10,).

print([y1,y2])
[array([2, 3, 7, 6, 4, 9, 2, 3, 6, 6]), array([2, 2, 1, 2, 2, 2, 2, 1, 1, 2])]

print(np.array([y1,y2]).shape)
(2, 10)

print(X.shape)
(10,)

But your desired shape for the labels was (10,2):

print(y)
[[2, 2], [3, 2], [7, 1], [6, 2], [4, 2], [9, 2], [2, 2], [3, 1], [6, 1], [6, 2]]

print(np.array(y).shape)
(10, 2)

Input and Output must have the same shape.

Categories

python - Split dataset containing multiple labels

python - Split dataset containing multiple labels

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags