Welcome To Ask or Share your Answers For Others

list - Count Words in Python

Welcome To Ask or Share your Answers For Others

1 Answer

answered Oct 24, 2021 by 深蓝 (71.8m points)

If you don't mind installing a new python library, I suggest you use gensim. The first tutorial does exactly what you ask:

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

You will then need to create the dictionary for your corpus of document and create the bag-of-words.

dictionary = corpora.Dictionary(texts)
dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future 
print(dictionary)

You can weight the result using tf-idf and stuff and do LDA quite easily after.

Have a look at the tutorial 1 here

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...