Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
339 views
in Technique[技术] by (71.8m points)

r - Difference in output between predict.rpart and predict.glm

I split a dataset up in a training and test sample. I then fit a logit model on the training data to predict the outcome of the test sample. I can do this in two ways:

Using Tidyverse:

logit_mod <- logistic_reg() %>% 
 set_mode("classification") %>% 
 set_engine("glm") %>%
 fit(y ~ x + z, data=train)
res <- predict(logit_mod, new_data = test, type="prob")

Or with the GLM class:

logit_mod <- glm(y ~ x + z, data=train, family='logit')
res <- predict(logit_mod, newdata=test, type="response")

Both methods give me different output (probabilities of y). While the model should be the same. extracting logit_mod[["fit"]] gives me the same coefficients as I have for logit_mod using GLM.

Why does the second method give me different predicted probabilities?

question from:https://stackoverflow.com/questions/65951910/difference-in-output-between-predict-rpart-and-predict-glm

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you do predict on a glm binomial regression, you get the probability of the positive class, and the probabilities from tidymodels are rounded up.

For example, a simple regression with response as 0/1, 1 being positive class :

library(tidymodels)
set.seed(111)
df = data.frame(y = factor(rbinom(50,1,0.5)),x=runif(50),z=runif(50))
train = df[1:40,]
test = df[41:50,]

logit_mod <- logistic_reg() %>% 
 set_mode("classification") %>% 
 set_engine("glm") %>%
 fit(y ~ x + z, data=train)
res <- predict(logit_mod, new_data = test, type="prob")

This is the prediction for class 1 :

res$.pred_1
       41        42        43        44        45        46        47        48 
0.3186626 0.3931925 0.4259043 0.3651420 0.6670263 0.6732433 0.5844562 0.5584770 
       49        50 
0.6791727 0.7567285

Do glm and you can see its exactly the same:

fit <- glm(y ~ x + z, data=train, family=binomial)
res2 <- predict(fit, newdata=test, type="response")

res2
       41        42        43        44        45        46        47        48 
0.3186626 0.3931925 0.4259043 0.3651420 0.6670263 0.6732433 0.5844562 0.5584770 
       49        50 
0.6791727 0.7567285 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...