Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
382 views
in Technique[技术] by (71.8m points)

python - Getting AUC from sklearn using preds from Statsmodel instance

I have a logistic regression that I want to know the AUC for.

I created the logistic regression model using statsmodels:

import statsmodels.api as sm

y = generate_data(dependent_var) # pseudocode
X = generate_data(independent_var) # pseudocode

X['constant'] = 1

logit_model=sm.Logit(y,X)
result=logit_model.fit()

Then, I use sklearn to get an AUC score for my model predictions:

from sklearn.metrics import roc_auc_score
roc_auc_score(y, result.predict())

The code runs and I get a AUC score, I just want to make sure I am passing variables between the package calls correctly.

question from:https://stackoverflow.com/questions/65601232/getting-auc-from-sklearn-using-preds-from-statsmodel-instance

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Input for roc_auc_score :

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

sklearn.metrics.roc_auc_score(y_true, y_score, *, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None)[source]

y_true array-like of shape (n_samples,) or (n_samples, n_classes)

True labels or binary label indicators. The binary and multiclass cases expect labels with shape (n_samples,) while the multilabel case expects binary label indicators with shape (n_samples, n_classes).

y_score array-like of shape (n_samples,) or (n_samples, n_classes)

Target scores.

To validate the input parameters:

y_true: correct

y_score: result.predict(X), based on https://www.statsmodels.org/devel/examples/notebooks/generated/predict.html

But your validation is missing the concept of train and testsplit wich is necassary in machine learning. Normally you always divide the dataset in a training part and in a test part. In a pseudo code this would look like:

import statsmodels.api as sm

y = generate_data(dependent_var) # pseudocode
X = generate_data(independent_var) # pseudocode

X['constant'] = 1
#pseudo line
X_train, X_test, y_train, y_test = train_test_split
logit_model=sm.Logit(y_train,X_train)
result=logit_model.fit()

from sklearn.metrics import roc_auc_score
roc_auc_score(y_test, result.predict(X_test))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...