I have the data that contains tons of x variables that are mainly categorical/nominal and my target variable is a multi-class label. I am able to build a couple models around to predict multi-class variables and compare how each of them performed. I have training and testing data. Both the training and testing data gave me good results.
Now, I am trying to find out "why" did the model predicted certain Y-variable? Meaning if I have weather data: X Variable: city, state, zip code, temp, year; Y Variable: rain, sun, cloudy, snow. I want to find out "why" did the model predict: rain, sun, cloudy, or snow respectfully. I used classification algorithms like multi-nominal, decision tree, ... etc
This may be a broad question but I need somewhere I can start researching. I can predict "what" but I can't see "why" it was predicted as rain, sun, cloudy, or snow label. Basically, I am trying to find the links between the variables that caused to predict the variable.
So far I thought of using correlation matrix, principal component analysis (that happened during model building process)...at least to see which are good predictors and which ones are not. Is there is a way to figure out "why" factor?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…