scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

Question

Welcome To Ask or Share your Answers For Others

scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

I'm totally novice on scikit-learn.

I want to know whether I should use the same Label Encoder instance that had used on training dataset or not when I want to convert the same feature's categorical data on test dataset. And, it means like below

from sklearn import preprocessing

# trainig data label encoding
le_blood_type = preprocessing.LabelEncoder()
df_training[ 'BLOOD_TYPE' ] = le_blood_type.fit_transform( df_training[ 'BLOOD_TYPE' ] )    # labeling from string
....
1. Using same label encoder
   df_test[ 'BLOOD_TYPE' ] = le_blood_type.fit_transform( df_test[ 'BLOOD_TYPE' ] )

2. Using different label encoder
   le_for_test_blood_type = preprocessing.LabelEncoder()
   df_test[ 'BLOOD_TYPE' ] = le_for_test_blood_type.fit_transform( df_test[ 'BLOOD_TYPE' ] )

Which one is right code? Or, whatever I choose the above's code it does not make any differences because training dataset's categorical data and test dataset's categorical data should be the same as a result.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:39:45+0000

The problem is the way you use it in fact.

As LabelEncoder is associating nominal feature to a numeric increment you should fit once and transform once the object has fitted. Don't forget that you need to have all your nominal feature in the training phase.

The good way to use it may be to have you nominal feature, do a fit on it, then only use the transform method.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6]) 
array([0, 0, 1, 2]...)

from official doc

Categories

scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags