machine learning - Balance classes in cross validation

Question

Welcome To Ask or Share your Answers For Others

machine learning - Balance classes in cross validation

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:33:39+0000

In class imbalance settings, artificially balancing the test/validation set does not make any sense: these sets must remain realistic, i.e. you want to test your classifier performance in the real world setting, where, say, the negative class will include the 99% of the samples, in order to see how well your model will do in predicting the 1% positive class of interest without too many false positives. Artificially inflating the minority class or reducing the majority one will lead to performance metrics that are unrealistic, bearing no real relation to the real world problem you are trying to solve.

For corroboration, here is Max Kuhn, creator of the caret R package and co-author of the (highly recommended) Applied Predictive Modelling textbook, in Chapter 11: Subsampling For Class Imbalances of the caret ebook:

You would never want to artificially balance the test set; its class frequencies should be in-line with what one would see “in the wild”.

Re-balancing makes sense only in the training set, so as to prevent the classifier from simply and naively classifying all instances as negative for a perceived accuracy of 99%.

Hence, you can rest assured that in the setting you describe the rebalancing takes action only for the training set/folds.

Categories

machine learning - Balance classes in cross validation

machine learning - Balance classes in cross validation

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags