I am trying to do the lasso regression on turnover following the code that I have found on the following link: https://www.kaggle.com/acasalan/predict-bank-turnover-lasso-regression.
In doing so, there are two things that seem strange to me in my results:
- lambda.min and lambda.1se are equal;
- the confusion matrix results doesn't show the positive.
Here are the code:
# Split the data into training and test set
set.seed(123) # cercare significato di questo valore
training.samples <- Dati3$Dimissioni %>%
createDataPartition(p = 0.7, list = FALSE) # randomly split the data into training set (70% for building a predictive model) and test set (30% for evaluating the model)
train.data <- Dati3[training.samples, ]
x <- model.matrix(Dimissioni~.,train.data)[,-1]
# Convert the outcome (class) to a numerical variable
y <- train.data$Dimissioni
#R function glmnet() [glmnet package] for computing penalized logistic regression.
glmnet(x, y, family = "binomial", alpha = 1, lambda = NULL)
# Find the best lambda using cross-validation
set.seed(123)
cv.lasso <- cv.glmnet(x, y, alpha = 1, family = "binomial")
plot(cv.lasso) # The left dashed vertical line indicates that the log of the optimal value of lambda is approximately -5, which is the one that minimizes the prediction error.
cv.lasso$lambda.min # exact value of lambda
cv.lasso$lambda.1se # value of lambda that gives the simplest model but also lies within one standard error of the optimal value of lambda
# both the two methods results the same value: 0.008018156,
# Using lambda.min as the best lambda, gives the following regression coefficients
coef(cv.lasso, cv.lasso$lambda.min)
# Final model with lambda.min (the same will be with lambda.1se)
lasso.model2 <- glmnet(x, y, alpha = 1, family = "binomial",
lambda = cv.lasso$lambda.min)
# Make prediction on test data
x.test <- model.matrix(Dimissioni ~., test.data)[,-1]
probabilities2 <- lasso.model2 %>% predict(newx = x.test)
predicted.classes2 <- ifelse(probabilities2 > 0.5, "pos", "neg")
# Model accuracy
observed.classes2 <- test.data$Dimissioni
mean(predicted.classes2 == observed.classes2)
#confusion matrix
table(predicted.classes2, observed.classes2)
second <- table(predicted.classes2, observed.classes2)
# Precision or accuracy of predicting correctly employee turnover:
round(second[2,2]/ (second[2,2]+second[2,1]),4)
These are the results of the confusion matrix:
enter image description here
Thanks for the help.
question from:
https://stackoverflow.com/questions/65830076/problems-with-lasso-regression-lambda-and-confusion-matrix 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…