python - How to get a non-shuffled train_test_split in sklearn

Question

Welcome To Ask or Share your Answers For Others

python - How to get a non-shuffled train_test_split in sklearn

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to get a non-shuffled train_test_split in sklearn

If I want a random train/test split, I use the sklearn helper function:

In [1]: from sklearn.model_selection import train_test_split
   ...: train_test_split([1,2,3,4,5,6])
   ...:
Out[1]: [[1, 6, 4, 2], [5, 3]]

What is the most concise way to get a non-shuffled train/test split, i.e.

[[1,2,3,4], [5,6]]

EDIT Currently I am using

train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):]

but hoping for something a little nicer. I have opened an issue on sklearn https://github.com/scikit-learn/scikit-learn/issues/8844

EDIT 2: My PR has been merged, in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:39:34+0000

I'm not adding much to Psidom's answer except an easy to copy paste function:

def non_shuffling_train_test_split(X, y, test_size=0.2):
    i = int((1 - test_size) * X.shape[0]) + 1
    X_train, X_test = np.split(X, [i])
    y_train, y_test = np.split(y, [i])
    return X_train, X_test, y_train, y_test

Update: At some point this feature became built in, so now you can do:

from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.2, shuffle=False)

Categories

python - How to get a non-shuffled train_test_split in sklearn

python - How to get a non-shuffled train_test_split in sklearn

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags