Python util.bigrams函数代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中nltk.util.bigrams函数的典型用法代码示例。如果您正苦于以下问题：Python bigrams函数的具体用法？Python bigrams怎么用？Python bigrams使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了bigrams函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: wiki_to_feature

def wiki_to_feature(wiki):
    """
    Specifically handles a single wiki document
    :param wiki: dict for wiki fields
    :type wiki: dict
    :return: tuple with wiki id and list of feature strings
    :rtype: tuple
    """
    try:
        features = []
        bow = []
        features += [u'ORIGINAL_HUB:%s' % wiki.get(u'hub_s', u'')]
        features += [u'TOP_CAT:%s' % u'_'.join(normalize(c)) for c in wiki.get(u'top_categories_mv_en', [])]
        bow += [u"_".join(normalize(c)) for c in wiki.get(u'top_categories_mv_en', [])]
        features += [u'TOP_ART:%s' % u"_".join(normalize(a)) for a in wiki.get(u'top_articles_mv_en', [])]
        bow += [u"_".join(normalize(a)) for a in wiki.get(u'top_articles_mv_en', [])]
        desc_ngrams = [u"_".join(n) for grouping in
                       [bigrams(normalize(np))
                       for np in TextBlob(wiki.get(u'description_txt', [u''])[0]).noun_phrases]
                       for n in grouping]
        bow += desc_ngrams
        features += [u'DESC:%s' % d for d in desc_ngrams]
        bow += [u"_".join(b) for b in bigrams(normalize(wiki[u'sitename_txt'][0]))]
        mp_nps = TextBlob(wiki.get(u'main_page_text', u'')).noun_phrases
        bow += [u"_".join(bg) for grouping in [bigrams(normalize(n)) for n in mp_nps] for bg in grouping]
        bow += [u''.join(normalize(w)) for words in [np.split(u" ") for np in mp_nps] for w in words]
        return wiki[u'id'], bow + features
    except Exception as e:
        print e, format_exc()
        raise e

开发者ID:Wikia，项目名称:data-science-toolkit，代码行数:30，代码来源:extract_wiki_data.py

示例2: getFeatures

def getFeatures(tokens, typefeat='unigrams'):

    if typefeat == 'unigrams':
        _features = FreqDist(tokens)

    elif typefeat == 'bigrams':
        _bigrams = bigrams(tokens)
        _features = FreqDist(_bigrams)

    elif typefeat == 'uni+bigrams':
        _bigrams = bigrams(tokens)
        _features = FreqDist(_bigrams + tokens)

    return _features

开发者ID:diegocaro，项目名称:opinionapp，代码行数:14，代码来源:features.py

示例3: score_by_topic

def score_by_topic(pkg, scores):
    '''Examines the pkg and adds scores according to topics in it.'''
    themes = Themes.instance()
    for level in range(3):
        pkg_text = package_text(pkg, level)
        words, words_without_stopwords = normalize_text(pkg_text)
        for num_words in (1, 2, 3):
            if num_words == 1:
                ngrams = words_without_stopwords
                topic_ngrams = themes.topic_words
                topic_ngrams_set = themes.topic_words_set
            elif num_words == 2:
                ngrams = bigrams(words)
                topic_ngrams = themes.topic_bigrams
                topic_ngrams_set = themes.topic_bigrams_set
            elif num_words == 3:
                ngrams = trigrams(words)
                topic_ngrams = themes.topic_trigrams
                topic_ngrams_set = themes.topic_trigrams_set
            matching_ngrams = set(ngrams) & topic_ngrams_set
            if matching_ngrams:
                for ngram in matching_ngrams:
                    occurrences = ngrams.count(ngram)
                    score = (3-level) * occurrences * num_words
                    theme = topic_ngrams[ngram]
                    ngram_printable = ' '.join(ngram) if isinstance(ngram, tuple) else ngram
                    reason = '"%s" matched %s' % (ngram_printable, LEVELS[level])
                    if occurrences > 1:
                        reason += ' (%s times)' % occurrences
                    scores[theme].append((score, reason))
                    log.debug(' %s %s %s', theme, score, reason)

开发者ID:palcu，项目名称:ckanext-dgu，代码行数:31，代码来源:theme.py

示例4: aggregate_topics_of_segmented_reports

 def aggregate_topics_of_segmented_reports(self, cut_of_segmented_reports, topics):
     aggregated_topics = []
     bigrams_of_topics = bigrams(map(lambda x: [x.decode('utf-8')], topics))
     for i in range(len(bigrams_of_topics)):
         for j in range(len(cut_of_segmented_reports)):
             aggregated_topics.extend(cut_of_segmented_reports[j][cut_of_segmented_reports[j].index(bigrams_of_topics[i][0]):cut_of_segmented_reports[j].index(bigrams_of_topics[i][1])])
     return aggregated_topics

开发者ID:EduardoCarvalho，项目名称:nltkPhraseDetector，代码行数:7，代码来源:extractPhrases.py

示例5: autocorrect_query

def autocorrect_query(query,df,cutoff=0.8,warning_on=True):
    """
    autocorrect a query based on the training set
    """	
    train_data = df.values[df['search_term'].values==query,:]
    s = ""
    for r in train_data:
        w = r
        s = "%s %s %s"%(s,BeautifulSoup(r[1]).get_text(" ",strip=True),BeautifulSoup(r[2]).get_text(" ",strip=True))
    s = re.findall(r'[\'\"\w]+',s.lower())
    s_bigram = [' '.join(i) for i in bigrams(s)]
    s.extend(s_bigram)
    corrected_query = []	
    for q in query.lower().split():
        if len(q)<=2:
            corrected_query.append(q)
            continue
        if bool(re.search('\d', q)): # skip if it is word with number, like 4.5in_
            corrected_query.append(q)
            continue
        corrected_word = difflib.get_close_matches(q, s,n=1,cutoff=cutoff)
        if len(corrected_word) >0:
            corrected_query.append(corrected_word[0])
        else :
            if warning_on:
                print("WARNING: cannot find matched word for '%s' -> used the original word"%(q))
            corrected_query.append(q)	
    return ' '.join(corrected_query)

开发者ID:aaxwaz，项目名称:Kaggle_HomeDepot_Stacking，代码行数:28，代码来源:utils.py

示例6: generate_unibitrigrams

def generate_unibitrigrams(key_score_file):
    with open(key_score_file,'rb') as infile:
        infile.readline()
        key_list=list()
        for line in infile:
            row=list(line.split(','))
            key_list.append(row[0])
    uni_bi_trigrams=[]
    for phrase in key_list:
        words=[]
        unigrams_ls=[]
        bigrams_ls=[]
        trigrams_ls=[]
        for word in nltk.word_tokenize(phrase):
            word=re.sub('[!"#$%&\'\(\)*+,-./:;<=>[email protected][\]\^_`{|}~]','',word)
            words.append(word)
        unigrams_ls=words
        #bigrams_ls=list(bigrams(words))

        for x in list(bigrams(words)):
            bigrams_ls.append(x[0]+' '+x[1] )


        for x in list(trigrams(words)):
            trigrams_ls.append(x[0]+' '+x[1]+' '+x[2] )
        #trigrams_ls=list(trigrams(words))
        uni_bi_trigrams=uni_bi_trigrams+unigrams_ls+bigrams_ls+trigrams_ls
    return uni_bi_trigrams

开发者ID:neethukurian，项目名称:keyextract，代码行数:28，代码来源:rake_stem.py

示例7: gender_feature

def gender_feature(text, feature_vect):
    """
    Extract the gender features
    :param text:
    :param feature_vect: contains a bag of words and a list of bigrams
    :return: a dictionary which contains the feature and its computed value
    """
    #sentence length and vocab features
    tokens = word_tokenize(text.lower())
    sentences = sent_tokenize(text.lower())
    words_per_sent = np.asarray([len(word_tokenize(s)) for s in sentences])

    #bag_of_word features
    bag_dict = {}
    for bag in feature_vect[:29]:
        bag_dict[bag] = bag in tokens

    #bigrams features
    bigram_dict = {}
    for big in feature_vect[29:]:
        bigram_dict[big] = big in bigrams(tokens)

    #POS tagging features
    POS_tag = ['ADJ', 'ADV', 'DET', 'NOUN', 'PRT', 'VERB', '.']
    tagged_word = parse(text, chunks=False, tagset='UNIVERSAL').split()
    simplified_tagged_word = [(tag[0], map_tag('en-ptb', 'universal', tag[1])) for s in tagged_word for tag in s]
    freq_POS = nltk.FreqDist(tag[1] for tag in simplified_tagged_word if tag[1] in POS_tag)

    d = dict({'sentence_length_variation': words_per_sent.std()}, **bag_dict)

    return dict(dict(d, **bigram_dict), **freq_POS)

开发者ID:kouki01，项目名称:Text_Mining_University_Project，代码行数:31，代码来源:Evaluation.py

示例8: get_bigram

def get_bigram(text_list):
	# text_list is a list of strings
	new_list = []
	for i in range(len(text_list)):
		new_list.append(list(bigrams(text_list[i])))

	return new_list

开发者ID:sheshant，项目名称:project-information-retrieval，代码行数:7，代码来源:new.py

示例9: BigramAll

def BigramAll():
    to_save_folder = "./#Bigram[.]/"
    folder_list = os.listdir("./");
    for folder in folder_list:
        if folder.find(".") != -1 :
            continue;
        folder_name = "./" + folder + "/"
        data_path = folder_name+"data.doc";
        fw = open(data_path,"r",encoding="utf8");
        text = fw.read();
        words = word_tokenize(text);

        big = list(bigrams(w for w in words if len(w) > 1 and w != "``"));
        myBig = []
        for bi in big:
            myBig.append(bi[0]+" "+bi[1]);

        fdist = FreqDist(str(w) for w in myBig);

        keys = fdist.most_common(len(fdist.keys()))
        dataFreq = "";
        for key in keys:
            dataFreq+= str(key[0]).strip()+","+str(key[1]).strip()+"\n";

        make_sure_path_exists(to_save_folder+folder)
        writer = open(to_save_folder+folder+"/"+folder+"[bigram_Freq].csv","w+",encoding="utf8");
        writer.write(dataFreq);
        fw.close();
        writer.close();

开发者ID:olee12，项目名称:Stylogenetics，代码行数:29，代码来源:MakeNormalData.py

示例10: generate_ds

 def generate_ds(self, words):
     learning_info_dict = {lang: {w: float(t) 
                           for w, t in self._language_model_cfd[lang].most_common()} 
                           for lang in self._language_model_cfd.keys()}
     testing_info_dict = {w: float(t) 
                          for w, t in FreqDist([tpl for word in words for tpl in bigrams(word)]).most_common()}
     return learning_info_dict, testing_info_dict

开发者ID:PyWilhelm，项目名称:FoLT2014，代码行数:7，代码来源:core.py

示例11: bigramsPhi

def bigramsPhi(comment):
    """The basis for a bigrams feature function.
    """
    sent = [stemmer.stem(tok) for tok in comment.split()] # Stemming + punc
    unis = Counter()
    sent = ["<<START>>"] + sent + ["<<END>>"]
    unis.update(bigrams(sent))                             # Bigrams
    return unis

开发者ID:alexsax，项目名称:abusive-comment-detection，代码行数:8，代码来源:baseline.py

示例12: perplexity

    def perplexity(self, sentence, method):
        """
        Compute the perplexity of a sentence given a estimation method

        You do not need to modify this code.
        """
        return 2.0 ** (-1.0 * mean([method(context, word) for context, word in \
                                    bigrams(self.tokenize_and_censor(sentence))]))

开发者ID:sangheestyle，项目名称:cl1-hw，代码行数:8，代码来源:language_model.py

示例13: bigram_format

def bigram_format( test_corpus ):
    """
    >>> bigram_format(["the dog runs STOP", "the cat walks STOP", "the dog runs STOP"])
    [[('the', 'dog'), ('dog', 'runs'), ('runs', 'STOP')], [('the', 'cat'), ('cat', 'walks'), ('walks', 'STOP')], [('the', 'dog'), ('dog', 'runs'), ('runs', 'STOP')]]
    """

    wl = [ [word for word in sentence.split()] for sentence in test_corpus] 
    return [ util.bigrams( l ) for l in wl ]

开发者ID:manniche，项目名称:nlangp，代码行数:8，代码来源:ngram_utilities.py

示例14: get_ngram_tokens

 def get_ngram_tokens(self, line):
     tokens = nltk.wordpunct_tokenize(line)
     message = [self.stemmer.stem(x) for x in tokens if len(x) > 2 and x not in self.stops]
     bigram = bigrams(message)
     for pair in bigram:
         joined = " ".join(pair)
         message.append(joined)
     return list(set(message))

开发者ID:johnnysparks，项目名称:feelsbro，代码行数:8，代码来源:johnnyprocess.py

示例15: sentProbaility

 def sentProbaility(self,sent,smooth_const):
     V = 217847
     tool = MyToolKit()
     bigrs = bigrams(tool.words(sent));
     p = 1
     for tuple in bigrs:
         p = math.exp(math.log(p)+math.log(self.LaplaceSmoothing(tuple[1],tuple[0],smooth_const,V)))
         #p = math.exp(math.log(p)+math.log(self.AbsoluteDiscountingSmoothing(tuple[1],tuple[0],smooth_const,V)))
     return p

开发者ID:djidan10，项目名称:Arabic-Diacritizer，代码行数:9，代码来源:Vocaliser.py

示例16: handleGrams

 def handleGrams(self, tokenList):
     res = []
     if self.unigrams:
         res.extend(tokenList)
     if self.bigrams:
         res.extend(bigrams(tokenList))
     if self.gappyBigrams:
         res.extend(self.gappy_bigrams(tokenList))
     return res

开发者ID:mhaas，项目名称:ma-thesis，代码行数:9，代码来源:feature_extraction.py

示例17: add_train

    def add_train(self, sentence):
        """
        Add the counts associated with a sentence.
        """

        # You'll need to complete this function, but here's a line of code that
        # will hopefully get you started.
        for context, word in bigrams(self.tokenize_and_censor(sentence)):
            None

开发者ID:jvieitez，项目名称:cl1-hw，代码行数:9，代码来源:language_model.py

示例18: process

    def process(self, filename):
        """process"""
        in_file = open(filename)
        self.content[filename] = in_file.read()
        in_file.close()

        words = self.content[filename].split(' ')
        grams = bigrams(words)
        self.add_grams(filename, grams)

开发者ID:slacy，项目名称:linky，代码行数:9，代码来源:build.py

示例19: get_feature_by_all_bigrams

 def get_feature_by_all_bigrams(self, bgs):
     bg_counts = list()
     for statuses in self._author_statuses:
         count = 0
         for status in statuses:
             for bg in bigrams(status):
                 if bg in bgs:
                     count += 1
         bg_counts.append(count)
     return bg_counts

开发者ID:artir，项目名称:cl2_project，代码行数:10，代码来源:word_feature.py

示例20: classify_paras

def classify_paras(paras, classifier):
    d = collections.defaultdict(list)

    for para in paras:
        words = [w.lower() for w in itertools.chain(*para)]
        feats = dict([(w, True) for w in words + bigrams(words)])
        label = classifier.classify(feats)
        d[label].append(" ".join(words))

    return d

开发者ID:B-Rich，项目名称:PyCon-NLTK-Tutorial，代码行数:10，代码来源:explore_nltk.py

注：本文中的nltk.util.bigrams函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python util.in_idle函数代码示例发布时间：2022-05-27

Python tree.Tree类代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13939|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10292|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4174|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4064|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3889|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3539|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3069|2022-01-22

8 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2727|2022-01-22

9 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2682|2022-05-25

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2347|2022-01-22

客服电话

电子邮件

Python util.bigrams函数代码示例

示例1: wiki_to_feature

示例2: getFeatures

示例3: score_by_topic

示例4: aggregate_topics_of_segmented_reports

示例5: autocorrect_query

示例6: generate_unibitrigrams

示例7: gender_feature

示例8: get_bigram

示例9: BigramAll

示例10: generate_ds

示例11: bigramsPhi

示例12: perplexity

示例13: bigram_format

示例14: get_ngram_tokens

示例15: sentProbaility

示例16: handleGrams

示例17: add_train

示例18: process

示例19: get_feature_by_all_bigrams

示例20: classify_paras

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053