在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:Naive Bayesian Classifier开源软件地址:https://gitee.com/mirrors/naive-bayes-classifier开源软件介绍:Naive Bayesian Classifieryet another general purpose Naive Bayesian classifier. ##InstallationYou can install this package using the following $ sudo pip install naiveBayesClassifier ##Example """Suppose you have some texts of news and know their categories.You want to train a system with this pre-categorized/pre-classified texts. So, you have better call this data your training set."""from naiveBayesClassifier import tokenizerfrom naiveBayesClassifier.trainer import Trainerfrom naiveBayesClassifier.classifier import ClassifiernewsTrainer = Trainer(tokenizer.Tokenizer(stop_words = [], signs_to_remove = ["?!#%&"]))# You need to train the system passing each text one by one to the trainer module.newsSet =[ {'text': 'not to eat too much is not enough to lose weight', 'category': 'health'}, {'text': 'Russia is trying to invade Ukraine', 'category': 'politics'}, {'text': 'do not neglect exercise', 'category': 'health'}, {'text': 'Syria is the main issue, Obama says', 'category': 'politics'}, {'text': 'eat to lose weight', 'category': 'health'}, {'text': 'you should not eat much', 'category': 'health'}]for news in newsSet: newsTrainer.train(news['text'], news['category'])# When you have sufficient trained data, you are almost done and can start to use# a classifier.newsClassifier = Classifier(newsTrainer.data, tokenizer.Tokenizer(stop_words = [], signs_to_remove = ["?!#%&"]))# Now you have a classifier which can give a try to classifiy text of news whose# category is unknown, yet.unknownInstance = "Even if I eat too much, is not it possible to lose some weight"classification = newsClassifier.classify(unknownInstance)# the classification variable holds the possible categories sorted by # their probablity valueprint classification Note: Definitely you will need much more training data than the amount in the above example. Really, a few lines of text like in the example is out of the question to be sufficient training set. ##What is the Naive Bayes Theorem and ClassifierIt is needless to explain everything once again here. Instead, one of the most eloquent explanations is quoted here. The following explanation is quoted from another Bayes classifier which is written in Go.
If you are very curious about Naive Bayes Theorem, you may find the following list helpful: #ImprovementsThis classifier uses a very simple tokenizer which is just a module to split sentences into words. If your training set is large, you can rely on the available tokenizer, otherwise you need to have a better tokenizer specialized to the language of your training texts. TODO
AUTHORS
|
请发表评论