Currently I am working on a text classification program. I elictied some features with the TfidfVectorizer.
Now, I want to delete some words form the original feature list because the do not provide usefull information.
I got two questions:
- Where are the features stored? ( via .get_feature_names() or in X_train_union_tfidf ?)
- How can I delete them?
I found a quite similiar question here : Ignore a column while building a model with SKLearn but i cannot link it to my problem.
Code:
X_train_union_tfidf = combined_tfidf.fit_transform(X_train)
X_test_union_tfidf = combined_tfidf.transform(X_test)
print(feature_union_df_tfidf)
unigram__compris 15.844468
unigram__devic 16.797861
bigram__speech recognit 17.065831
bigram__invent relat 17.527465
bigram__present invent 21.158065
Lets say I want to delete the invent relat and present invent. How do i delete it from X_train_union_tfidf (sparse matrix) before passing it to the classification alogrithm?
naive_bayes = MultinomialNB()
naive_bayes.fit(X_train_union_tfidf, y_train)
predictions_NB_tfidf = naive_bayes.predict(X_test_union_tfidf)
predicted_prob_NB_tf = naive_bayes.predict_proba(X_test_union_tfidf)
question from:
https://stackoverflow.com/questions/66061621/how-do-i-delete-features-from-feature-list-tfidfvectorizer 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…