Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
240 views
in Technique[技术] by (71.8m points)

python - Reduce number of levels for large categorical variables

Are there some ready to use libraries or packages for python or R to reduce the number of levels for large categorical factors?

I want to achieve something similar to R: "Binning" categorical variables but encode into the most frequently top-k factors and "other".

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The R package forcats has fct_lump() for this purpose.

library(forcats)
fct_lump(f, n)

Where f is the factor and n is the number of most common levels to be preserved. The remaining are recoded to Other.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...