Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
110 views
in Technique[技术] by (71.8m points)

python - Splitting train and test data by a particular variable

i am trying this code for splitting data into train and test for a logistic regression:

"""

from sklearn.model_selection import train_test_split

#Split the data into test and train
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3,
random_state=10)

"""

While splitting the train and test , i would like to split it by issue_dt which is a variable (date of issue of loan) but the variable should not be used for the logistic regression, Please any inputs on this

question from:https://stackoverflow.com/questions/65850613/splitting-train-and-test-data-by-a-particular-variable

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Assume your X, Y are pandas dataframes.

Assume your 'issue_dt' is a column in X.

The following code

X_drop = X.drop(columns=['issue_dt'])
ind = X['issue_dt'] < a_specific_date # e.x., a_specific_date = X['issue_dt'].iloc[10]

X_train, X_test = X_drop[ind], X_drop[~ind]
Y_train, Y_test = Y[ind], Y[~ind]

might help you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...