Welcome To Ask or Share your Answers For Others

python - Reduce number of levels for large categorical variables

Welcome To Ask or Share your Answers For Others

1 Answer

answered Oct 24, 2021 by 深蓝 (71.8m points)

The R package forcats has fct_lump() for this purpose.

library(forcats)
fct_lump(f, n)

Where f is the factor and n is the number of most common levels to be preserved. The remaining are recoded to Other.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...