I am trying to use the random forests package for classification in R.
The Variable Importance Measures listed are:
- mean raw importance score of variable x for class 0
- mean raw importance score of variable x for class 1
MeanDecreaseAccuracy
MeanDecreaseGini
Now I know what these "mean" as in I know their definitions. What I want to know is how to use them.
What I really want to know is what these values mean in only the context of how accurate they are, what is a good value, what is a bad value, what are the maximums and minimums, etc.
If a variable has a high MeanDecreaseAccuracy
or MeanDecreaseGini
does that mean it is important or unimportant? Also any information on raw scores could be useful too.
I want to know everything there is to know about these numbers that is relevant to the application of them.
An explanation that uses the words 'error', 'summation', or 'permutated' would be less helpful then a simpler explanation that didn't involve any discussion of how random forests works.
Like if I wanted someone to explain to me how to use a radio, I wouldn't expect the explanation to involve how a radio converts radio waves into sound.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…