Input for roc_auc_score
:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
sklearn.metrics.roc_auc_score(y_true, y_score, *, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None)[source]
y_true array-like of shape (n_samples,) or (n_samples, n_classes)
True labels or binary label indicators. The binary and multiclass cases expect labels with shape (n_samples,) while the multilabel case expects binary label indicators with shape (n_samples, n_classes).
y_score array-like of shape (n_samples,) or (n_samples, n_classes)
Target scores.
To validate the input parameters:
y_true: correct
y_score: result.predict(X), based on https://www.statsmodels.org/devel/examples/notebooks/generated/predict.html
But your validation is missing the concept of train and test
split wich is necassary in machine learning. Normally you always divide the dataset in a training part and in a test part. In a pseudo code this would look like:
import statsmodels.api as sm
y = generate_data(dependent_var) # pseudocode
X = generate_data(independent_var) # pseudocode
X['constant'] = 1
#pseudo line
X_train, X_test, y_train, y_test = train_test_split
logit_model=sm.Logit(y_train,X_train)
result=logit_model.fit()
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test, result.predict(X_test))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…