Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
850 views
in Technique[技术] by (71.8m points)

scikit learn - Usng same Label Encoder to test dataset? or new Label Encoder?

I'm totally novice on scikit-learn.

I want to know whether I should use the same Label Encoder instance that had used on training dataset or not when I want to convert the same feature's categorical data on test dataset. And, it means like below

from sklearn import preprocessing

# trainig data label encoding
le_blood_type = preprocessing.LabelEncoder()
df_training[ 'BLOOD_TYPE' ] = le_blood_type.fit_transform( df_training[ 'BLOOD_TYPE' ] )    # labeling from string
....
1. Using same label encoder
   df_test[ 'BLOOD_TYPE' ] = le_blood_type.fit_transform( df_test[ 'BLOOD_TYPE' ] )

2. Using different label encoder
   le_for_test_blood_type = preprocessing.LabelEncoder()
   df_test[ 'BLOOD_TYPE' ] = le_for_test_blood_type.fit_transform( df_test[ 'BLOOD_TYPE' ] )

Which one is right code? Or, whatever I choose the above's code it does not make any differences because training dataset's categorical data and test dataset's categorical data should be the same as a result.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The problem is the way you use it in fact.

As LabelEncoder is associating nominal feature to a numeric increment you should fit once and transform once the object has fitted. Don't forget that you need to have all your nominal feature in the training phase.

The good way to use it may be to have you nominal feature, do a fit on it, then only use the transform method.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6]) 
array([0, 0, 1, 2]...)

from official doc


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...