在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
NMF和LDA主题提取简介非负矩阵分解,即Non-negative Matrix Factorization,简写为NMF。 潜在狄利克雷分布,即Latent Dirichlet Allocation, 简写为LDA。 本文是应用 非负矩阵分解应用于两个不同的目标函数:Frobenius范数和广义Kullback-Leibler散度。后者等效于概率潜在语义索引(PLSA)。 使用默认参数(n_samples /n_features /n_components)的情况下,示例在几十秒内即可运行完成。您可以尝试增加维度,但是要注意时间复杂度以免运行过长时间:NMF的时间复杂度是多项式的;而LDA时间复杂度与(n_samples * iterations,样本数乘以迭代次数)成正比。 代码实现[Python]
代码执行代码运行时间大约:0分13.781秒。 Loading dataset... done in 7.911s. Extracting tf-idf features for NMF... done in 0.268s. Extracting tf features for LDA... done in 0.254s. Fitting the NMF model (Frobenius norm) with tf-idf features, n_samples=2000 and n_features=1000... done in 0.406s. Topics in NMF model (Frobenius norm): Topic #0: just people don think like know time good make way really say right ve want did ll new use years Topic #1: windows use dos using window program os drivers application help software pc running ms screen files version card code work Topic #2: god jesus bible faith christian christ christians does heaven sin believe lord life church mary atheism belief human love religion Topic #3: thanks know does mail advance hi info interested email anybody looking card help like appreciated information send list video need Topic #4: car cars tires miles 00 new engine insurance price condition oil power speed good 000 brake year models used bought Topic #5: edu soon com send university internet mit ftp mail cc pub article information hope program mac email home contact blood Topic #6: file problem files format win sound ftp pub read save site help image available create copy running memory self version Topic #7: game team games year win play season players nhl runs goal hockey toronto division flyers player defense leafs bad teams Topic #8: drive drives hard disk floppy software card mac computer power scsi controller apple mb 00 pc rom sale problem internal Topic #9: key chip clipper keys encryption government public use secure enforcement phone nsa communications law encrypted security clinton used legal standard Fitting the NMF model (generalized Kullback-Leibler divergence) with tf-idf features, n_samples=2000 and n_features=1000... done in 1.769s. Topics in NMF model (generalized Kullback-Leibler divergence): Topic #0: just people don like did know make really right think say things time look way didn ve course probably good Topic #1: help thanks windows know hi need using does looking anybody appreciated card mail software use info email ftp available pc Topic #2: does god believe know mean true christians read point jesus christian church come people fact says religion say agree bible Topic #3: know thanks mail interested like new just bike email edu advance want contact really list heard com post hear information Topic #4: 10 new 30 12 20 50 11 sale 16 15 time 14 old power ago good 100 great offer cost Topic #5: number 1993 data subject government new numbers provide information space following com research include large note group major time talk Topic #6: edu problem file com remember try soon article mike files code program sun free send think cases manager little called Topic #7: game year team games world fact second case won said win division play best clearly claim allow example used doesn Topic #8: think don drive hard need bit mac make sure read apple going comes disk computer case pretty drives software ve Topic #9: good just use like doesn got way don ll going does chip better doing bad key want sure bit car Fitting LDA models with tf features, n_samples=2000 and n_features=1000... done in 3.167s. Topics in LDA model: Topic #0: edu com mail send graphics ftp pub available contact university list faq ca information cs 1993 program sun uk mit Topic #1: don like just know think ve way use right good going make sure ll point got need really time doesn Topic #2: christian think atheism faith pittsburgh new bible radio games alt lot just religion like book read play time subject believe Topic #3: drive disk windows thanks use card drives hard version pc software file using scsi help does new dos controller 16 Topic #4: hiv health aids disease april medical care research 1993 light information study national service test led 10 page new drug Topic #5: god people does just good don jesus say israel way life know true fact time law want believe make think Topic #6: 55 10 11 18 15 team game 19 period play 23 12 13 flyers 20 25 22 17 24 16 Topic #7: car year just cars new engine like bike good oil insurance better tires 000 thing speed model brake driving performance Topic #8: people said did just didn know time like went think children came come don took years say dead told started Topic #9: key space law government public use encryption earth section security moon probe enforcement keys states lunar military crime surface technology 源码下载
参考资料
|
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13