Python sessioninfomanager.updateSessionInfo函数代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中utils.sessioninfomanager.updateSessionInfo函数的典型用法代码示例。如果您正苦于以下问题：Python updateSessionInfo函数的具体用法？Python updateSessionInfo怎么用？Python updateSessionInfo使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了updateSessionInfo函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     self.__total_threads_count = 0
     self.__last_timestamp = datetime(1980, 1, 1)
     #The Maximum No of threads to process, Bcoz, not all the forums get
     #updated Everyday, At maximum It will 100
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'anandtechforums_maxthreads'))
     self.__setSoupForCurrentUri()
     while self.__getThreads():
         try:
             next_page_uri = self.soup.find('a', text='&gt;',rel='Next').parent['href']
             data_dict = dict(parse_qsl(next_page_uri.split('?')[-1]))
             if 's' in data_dict.keys():
                 data_dict.pop('s')
             self.currenturi = self.__baseuri + 'forumdisplay.php?'+ urlencode(data_dict)                    
             self.__setSoupForCurrentUri()
         except:
             log.exception(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     log.info(self.log_msg('# of Tasks Added is %d'%len(self.linksOut)))
     #self.linksOut = []
     if self.linksOut:
         updateSessionInfo('Search', self.session_info_out, \
                 self.__last_timestamp , None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:31，代码来源:anandtechforumsconnector.py

示例2: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     self.__total_threads_count = 0
     self.__last_timestamp = datetime( 1980,1,1 )
     self.__setSoupForCurrentUri()
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'htcpedia_maxthreads'))
     
     while self.__getThreads():
         try:
             self.currenturi = self.currenturi = self.__removeSessionId('http://htcpedia.com/forum/'  + self.soup.find('a', rel='next')['href'])
             self.__setSoupForCurrentUri()
         except:
             log.info(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     if self.__links_to_process:
         updateSessionInfo('Search', self.session_info_out,\
                 self.__last_timestamp , None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     log.info(self.log_msg('# of tasks added is %d'%len(self.linksOut)))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:25，代码来源:htcpediaconnector.py

示例3: __createTasksForThreads

    def __createTasksForThreads(self):
        """
        This will create Tasks for the threads found on the given url
        The # of Tasks are limited by Config Variable
        """
        self.__current_thread_count = 0
        self.__last_timestamp = datetime(1980, 1, 1)
        self.__max_threads_count = int(tg.config.get(path='Connector', 
                                                     key='ivillage_maxthreads'))
        while self.__getThreads():
            try:
                link_next = self.soup.find('a', href=True, text='Next').parent['href']
                self.currenturi = link_next

                self.__setSoupForCurrentUri()
            except:
                log.exception(self.log_msg('Next Page link not found for url %s' % self.currenturi))
                break

        log.info('Total # of tasks found is %d' % len(self.linksOut))
        if self.linksOut:
            updateSessionInfo('Search', self.session_info_out, 
                              self.__last_timestamp , None, 'ForumThreadsPage', 
                              self.task.instance_data.get('update'))
        return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:25，代码来源:ivillageconnector.py

示例4: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     try:
         self.__total_threads_count = 0
         self.__last_timestamp = datetime( 1980,1,1 )
         #The Maximum No of threads to process, Bcoz, not all the forums get
         #updated Everyday, At maximum It will 100
         self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                             'talkandroid_maxthreads'))
         self.__setSoupForCurrentUri()
         while True:
             try:
                 if not self.__getThreads():
                     break
                 self.currenturi =  self.soup.find('a', text='&gt;').parent['href']
                 self.__setSoupForCurrentUri()
             except:
                 log.info(self.log_msg('Next Page link not found for url \
                                                     %s'%self.currenturi))
                 break
         if self.linksOut:
             updateSessionInfo('Search', self.session_info_out,\
                     self.__last_timestamp , None, 'ForumThreadsPage', \
                     self.task.instance_data.get('update'))
         return True
     except:
         log.exception(self.log_msg('Exception while creating tasks for the url %s'\
                                                             %self.currenturi)) 
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:32，代码来源:talkandroidconnector.py

示例5: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     log.info('hello')
     self.__current_thread_count = 0
     self.__last_timestamp = datetime(1980, 1, 1)
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'mdjunction_maxthreads'))
     while self.__getThreads():
         try:
             current_page_tag = self.soup.find('strong', text=re.compile('^\[\d+\]$'))
             self.currenturi = current_page_tag.findParent('td').find('a', text=str(int(current_page_tag[1:-1])+1)).parent['href']
             self.__setSoupForCurrentUri()
         except:
             log.exception(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     log.info('Total # of tasks found is %d'%len(self.linksOut))
     #self.linksOut = None
     if self.linksOut:
         updateSessionInfo('Search', self.session_info_out, \
                 self.__last_timestamp, None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:26，代码来源:mdjunctionconnector.py

示例6: __createTasksForThreads

 def __createTasksForThreads(self):
         """
         This will create Tasks for the threads found on the given url
             The # of Tasks are limited by Config Variable
         """
         self.__setSoupForCurrentUri()
         self.__total_threads_count = 0
         self.__baseuri = 'http://baliforum.com'
         self.__last_timestamp =datetime(1980, 1, 1) 
         #The Maximum No of threads to process, Bcoz, not all the forums get
         #updated Everyday, At maximum It will 100
         self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'baliforum_maxthreads'))
         while self.__processForumUrl():
             try:
                 self.currenturi =self.soup.find('img', alt='Next page').findParent('a')['href']                    
                 self.__setSoupForCurrentUri()
             except:
                 log.info(self.log_msg('Next Page link not found for url \
                                                     %s'%self.currenturi))
                 break
         log.debug(self.log_msg('LINKSOUT: ' + str(len(self.linksOut))))
         #self.linksOut = [] # To Remove
         if self.linksOut:
             updateSessionInfo('Search', self.session_info_out, \
                         self.__last_timestamp , None, 'ForumThreadsPage', \
                         self.task.instance_data.get('update'))
         return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:28，代码来源:baliforumconnector.py

示例7: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     self.__current_thread_count = 0
     self.__last_timestamp = datetime( 1980,1,1 )
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'fatwallet_maxthreads'))
     while self.__getThreads():
         try:
             headers = []
             next_tag = self.soup.find('input', value='Next 20')
             form_tag = next_tag.findParent('form')
             input_values = form_tag.findAll('input', type='hidden')
             for input_value in input_values:
                 headers.append((input_value['name'],input_value['value'] ))
             self.currenturi = 'http://www.fatwallet.com' + form_tag['action'] + '?' + urlencode(headers )
             self.__setSoupForCurrentUri()
         except:
             log.exception(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     log.info('Total # of tasks found is %d'%len(self.linksOut))
     #self.linksOut = None
     if self.linksOut:
         updateSessionInfo('Search', self.session_info_out,\
                 self.__last_timestamp , None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:30，代码来源:fatwalletconnector.py

示例8: fetch

 def fetch(self):
     """
     Fetches the first RESULTS_ITERATIONS results as specified by the attributes, and populate the result links to self.linksOut
     """
     try:
         if re.match(".*\/threads[\/]?$",self.task.instance_data['uri']):
             self.last_timestamp = datetime(1,1,1)
             self.forum_name = re.findall('\/([^\/]+)\/threads\/?$', urlparse(self.task.instance_data['uri'])[2])[0]
             self.crawl_count = int(tg.config.get(path='Connector',key='microsoft_numresults'))
             self.count = 0
             self.done = False
             self.currenturi = self.task.instance_data['uri']+'?sort=firstpostdesc'
             while self.count< self.crawl_count and not self.done:
                 self.__getPageData()
             log.debug(self.log_msg("Length of linksout is %d"%(len(self.linksOut))))
             if self.linksOut:
                 updateSessionInfo('search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
         elif re.match(".*\/thread\/.*?$",self.task.instance_data['uri']):
             self.__getThread()
             self.__getQuestion()
             self.__getAnswers()
             return True
         else:
             log.exception(self.log_msg("Unassociated url %s"%(self.task.instance_data['uri'])))
             return False
     except:
         log.exception(self.log_msg("Exception occured in fetch()"))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:29，代码来源:microsoftsocialconnector.py

示例9: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     self.__total_threads_count = 0
     self.__baseuri = 'http://forums.seagate.com'
     self.__last_timestamp = datetime(1980, 1, 1)
     #The Maximum No of threads to process, Bcoz, not all the forums get
     #updated Everyday, At maximum It will 100
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'seagateforums_maxthreads'))
     self.__setSoupForCurrentUri()
     while self.__getThreads():
         try:
             self.currenturi = self.__baseuri + self.soup.find('a', \
                     text='Next').findParent('a')['href'].split(';')[0]
             self.__setSoupForCurrentUri()
         except:
             log.info(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     #self.linksOut = []
     if self.linksOut:
         updateSessionInfo('Search', self.session_info_out, \
                 self.__last_timestamp , None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:28，代码来源:seagateforumsconnector.py

示例10: __createTasksForThreads

 def __createTasksForThreads(self):
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     self.__total_threads_count = 0
     self.__last_timestamp = datetime( 1980,1,1 )
     self.__setSoupForCurrentUri()
     self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                         'htchd2forum_maxthreads'))
     current_page_no = 1
     while self.__getThreads():
         try:
             current_page_no += 1
             self.currenturi = self.__removeSessionId([x for x in self.soup.findAll('a', 'navPages') if int(stripHtml(x.renderContents()))==current_page_no][0]['href'])
             self.__setSoupForCurrentUri()
         except:
             log.info(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
             break
     if self.__links_to_process:
         updateSessionInfo('Search', self.session_info_out,\
                 self.__last_timestamp , None, 'ForumThreadsPage', \
                 self.task.instance_data.get('update'))
     log.info(self.log_msg('# of tasks added is %d'%len(self.linksOut)))
     return True

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:26，代码来源:htchd2forumconnector.py

示例11: __createTasksForThreads

 def __createTasksForThreads(self):
     
     """
     This will create Tasks for the threads found on the given url
     The # of Tasks are limited by Config Variable
     """
     try:
                 
         self.__total_threads_count = 0
         self.__last_timestamp = datetime( 1980,1,1 )
         self.__max_threads_count = int(tg.config.get(path='Connector', key=\
                                             'iphoneforums_maxthreads'))
         self.__setSoupForCurrentUri()
         while self.__processForumUrl():
             try:
                 self.currenturi = self.soup.find('a',title = re.compile('Next Page - '))['href']
                 self.__setSoupForCurrentUri()
             except:
                 log.exception(self.log_msg('Next Page link not found for url \
                                                 %s'%self.currenturi))
                 break                
                 
         log.info(self.log_msg('LINKSOUT: ' + str(len(self.linksOut))))
         #self.linksOut = [] # To Remove
         if self.linksOut:
             updateSessionInfo('Search', self.session_info_out, \
                         self.__last_timestamp , None, 'ForumThreadsPage', \
                         self.task.instance_data.get('update'))
         return True  
     except:
         log.info(self.log_msg('Exception while creating tasks for the url %s'\
                                                      %self.currenturi)) 
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:33，代码来源:iphoneforumsconnector.py

示例12: fetch

    def fetch(self):
        """
        Fetch of egg head cafe
        """
        self.genre="Review"
        try:
            self.base_url = 'http://www.eggheadcafe.com'
            self.parent_uri = self.currenturi
            self.total_posts_count = 0
            self.last_timestamp = datetime( 1980,1,1 )
            self.max_posts_count = int(tg.config.get(path='Connector',key='eggheadcafe_max_threads_to_process'))
            #headers={'Host':'www.eggheadcafe.com'}
            #headers['Referer'] = self.currenturi
            #data = dict(parse_qsl(self.currenturi.split('?')[-1]))
            if not 'forumtree.aspx' in self.currenturi:
                if not self.__setSoup():
                    log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                    return False
                self.__getParentPage()
                while True:
                    parent_soup = copy.copy(self.soup)
                    self.__addPosts()
                    try:
                        self.currenturi = self.base_url +  parent_soup.find('a',text='Next').parent['href']
                        if not self.__setSoup():
                            break
                    except:
                        log.info(self.log_msg('Next Page link not found'))
                        break
                return True
            else:
                if not self.__setSoup():
                    log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                    return False
                while True:
                    try:
                        if not self.__getThreadPage():
                            break
##                        data = dict(parse_qsl(self.currenturi.split('?')[-1]))
##                        data['ctl00$ContentPlaceHolder1$ddlMessageCount'] = '20'
##                        data['ctl00$ContentPlaceHolder1$ddlOrder'] ='Desc'
##                        data['__EVENTTARGET'] = self.soup.find('a',id=re.compile('LinkButtonNext'))['id'].replace('_','$')
##                        jscript_arg = ['__EVENTVALIDATION','__VIEWSTATE']
##                        for each in jscript_arg:
##                            data[each] =  self.soup.find('input',id=each)['value']
                        self.currenturi = self.base_url +  self.soup.find('a',text='Next').parent['href']
                        if not self.__setSoup():
                            break
                    except:
                        log.info(self.log_msg('Next Page link not found'))
                        break
                if self.linksOut:
                    updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
                return True
        except:
            log.exception(self.log_msg('Exception in fetch'))
            return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:57，代码来源:eggheadcafeconnector.py

示例13: fetch

 def fetch(self):
     """
     Fetch of http://forums.devx.com
     """
     self.genre="Review"
     try:
         
         self.parent_uri = self.currenturi
         log.info(self.parent_uri)
         self.currenturi =  self.__getStandUri(self.parent_uri)
         log.info(self.log_msg('The Standard Uri is'))
         log.info(self.parent_uri)
         if self.currenturi.startswith('http://forums.devx.com/showthread.'):
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type= True
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = self.__getStandUri('http://forums.devx.com/' + self.soup.find('a',text='&gt;').parent['href'])
                 except:
                     log.info(self.log_msg('Next page not set'))
                     break
                 if not self.__setSoup():
                     log.info(self.log_msg('cannot continue'))
                     break
             return True
         elif self.currenturi.startswith('http://forums.devx.com/forumdisplay'):
             self.total_posts_count = 0
             self.last_timestamp = datetime( 1980,1,1 )
             self.max_posts_count = int(tg.config.get(path='Connector',key='devxforum_numresults'))
             self.currenturi = self.currenturi + '&daysprune=-1&order=desc&sort=lastpost'
             log.info(self.log_msg('The link is:'))
             log.info(self.currenturi)
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             while True:
                 if not self.__getThreads():
                     break
                 try:
                     self.currenturi = self.__getStandUri('http://forums.devx.com/' + self.soup.find('a',text='&gt;').parent['href'])
                     if not self.__setSoup():
                         break
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
         else:
             log.info(self.log_msg('Url format is not recognized, Please verify the url'))
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:57，代码来源:devxconnector.py

示例14: fetch

 def fetch(self):
     """
     Fetch of polish forums
     sample uri :  http://www.mynextcollege.com/college-reviews/discussion-room-f6.html
     """
     self.genre="Review"
     try:
         self.parent_uri = self.currenturi
         self.currenturi = self.currenturi.split('-sid=')[0]
         if self.currenturi=='http://www.mynextcollege.com/college-reviews/':
             try:
                 if not self.__setSoup():
                     return False
                 self.__addFortumLinks()
             except:
                 log.info(self.log_msg('cannot add tasks'))
                 return False
         if re.match('.*?\-f\d+\.html$', self.currenturi):
             self.total_posts_count = 0
             self.last_timestamp = datetime( 1980,1,1 )
             self.max_posts_count = int(tg.config.get(path='Connector',key='mynextcollege_numresults'))
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             while True:
                 if not self.__getThreads():
                     break
                 try:
                     self.currenturi = 'http://www.mynextcollege.com/college-reviews' + self.soup.find('a',text='Next').parent['href'][1:].split('-sid=')[0]
                     if not self.__setSoup():
                         break
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
         else:
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type= True
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = 'http://www.mynextcollege.com/college-reviews' + self.soup.find('a',text='Next').parent['href'][1:].split('-sid=')[0]
                     if not self.__setSoup():
                         break
                 except:
                     log.info(self.log_msg('Next page not set'))
                     break
             return True
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:56，代码来源:mynextcollegeconnector.py

示例15: fetch

 def fetch(self):
     """
     Fetch of forum page
     """
     self.genre="Review"
     try:
         self.parent_uri = self.currenturi
         self.base_url = 'http://ocenbank.pl/forum/'
         if self.currenturi.startswith('http://ocenbank.pl/forum/viewforum'):
             self.total_posts_count = 0
             self.last_timestamp = datetime( 1980,1,1 )
             self.max_posts_count = int(tg.config.get(path='Connector',key='ocean_forum_numresults'))
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             next_page_no = 2
             while True:
                 if not self.__getThreads():
                     break
                 try:
                     self.currenturi = self.base_url + self.soup.find('p','pagelink conl').find('a',text=str(next_page_no)).parent['href']
                     if not self.__setSoup():
                         break
                     next_page_no = next_page_no + 1
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
         elif self.currenturi.startswith('http://ocenbank.pl/forum/viewtopic'):
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type = True
             next_page_no = 2
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = self.base_url + self.soup.find('p','pagelink conl').find('a',text=str(next_page_no)).parent['href']
                     if not self.__setSoup():
                         break
                     next_page_no = next_page_no + 1
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             return True
         else:
             log.info(self.log_msg('Wrong url is feeded'))
             log.info(self.log_msg('Hai+'+ self.currenturi))
             return False
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:55，代码来源:oceanconnector.py

示例16: fetch

 def fetch(self):
     """
     Fetch of http://forums.msexchange.org/Message_Routing/forumid_18/tt.htm
     """
     self.genre="Review"
     try:
         #self.currenturi ='http://forums.msexchange.org/Outlook_anywhere/m_1800490386/tm.htm'
         self.parent_uri = self.currenturi
         forum_id = self.currenturi.split('/')[-2]
         if forum_id.startswith('forumid'):
             self.total_posts_count = 0
             self.last_timestamp = datetime( 1980,1,1 )
             self.max_posts_count = int(tg.config.get(path='Connector',key='msexchange_forum_numresults'))
             self.currenturi = 'http://forums.msexchange.org/%s/p_1/tmode_1/smode_1/tt.htm'%forum_id
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             while True:
                 if not self.__getThreads():
                     break
                 try:
                     self.currenturi = 'http://forums.msexchange.org' + self.soup.find('a',text='next &gt;').findParent('a')['href']
                     if not self.__setSoup():
                         break
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
         else:
             #self.currenturi = 'http://forums.msexchange.org/%s/p_1/tmode_2/smode_1/tt.htm'%forum_id
             #headers = {'Referer':self.task.pagedata['Referer']}
             #log.info(headers)
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type= True
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = 'http://forums.msexchange.org' + self.soup.find('a',text='next &gt;').findParent('a')['href']
                 except:
                     log.info(self.log_msg('Next page not set'))
                     break
                 if not self.__setSoup():
                     log.info(self.log_msg('cannot continue'))
                     break
             return True
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:53，代码来源:msexchangeforumconnector.py

示例17: fetch

 def fetch(self):
     """
     Fetch of sql server central
     """
     self.genre="Review"
     try:
         self.parent_uri = self.currenturi
         self.total_posts_count = 0
         self.last_timestamp = datetime( 1980,1,1 )
         self.max_posts_count = int(tg.config.get(path='Connector',key='sqlservercentral_numresults'))
         self.hrefs_info = self.currenturi.split('/')
         if self.currenturi.startswith('http://www.sqlservercentral.com/Forums/Topic'):
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type = True
             next_page_no = 2
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = 'http://www.sqlservercentral.com/Forums/' +  self.soup.find('table',id= re.compile('FooterTable')).find('a',text=str(next_page_no)).parent['href']
                     if not self.__setSoup():
                         break
                     next_page_no = next_page_no + 1
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             return True
         else:
             self.currenturi = self.currenturi.replace('Default.aspx','afcol/0/afsort/DESC/Default.aspx')
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             next_page_no = 2
             while True:
                 if not self.__getThreadPage():
                     break
                 try:
                     self.currenturi = 'http://www.sqlservercentral.com/Forums/' +  self.soup.find('a',title='Next Page')['href']
                     if not self.__setSoup():
                         break
                     next_page_no = next_page_no + 1
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:52，代码来源:sqlservercentralconnector.py

示例18: fetch

    def fetch(self):
        """

        """
        self.genre="Review"
        try:
            self.parent_uri = self.currenturi
            if self.currenturi.startswith('http://able2know.org/topic/'):
                if not self.__setSoup():
                    log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                    return False
                self.__getParentPage()
                self.post_type= True
                while True:
                    self.__addPosts()
                    try:
                        self.currenturi = self.soup.find('a',title = 'Next Page')['href']
                    except:
                        log.info(self.log_msg('Next page not set'))
                        break
                    if not self.__setSoup():
                        log.info(self.log_msg('cannot continue'))
                        break
                return True
            elif self.currenturi.startswith('http://able2know.org/tag/'):
                self.total_posts_count = 0
                self.last_timestamp = datetime( 1980,1,1 )
                self.max_posts_count = int(tg.config.get(path='Connector',key='know_forum_numresults'))
                self.currenturi = self.currenturi 
                if not self.__setSoup():
                    log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                    return False
                count = 2
                while True:
                    if not self.__getThreads():
                        break
                    try:
                        self.currenturi = self.currenturi + self.soup.find('a',accesskey='n')['href'].lstrip('.')
                        if not self.__setSoup():
                            break
                    except:
                        log.info(self.log_msg('Next Page link not found'))
                        break
                    count = count+1
                if self.linksOut:
                    updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
                return True
            else:
                log.info(self.log_msg('Url format is not recognized, Please verify the url'))
        except:
            log.exception(self.log_msg('Exception in fetch'))
            return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:52，代码来源:able2knowconnector.py

示例19: fetch

 def fetch(self):
     """www.petri.co.il
     """
     self.genre="Review"
     try:
         self.parent_uri = self.currenturi
         if self.currenturi.startswith('http://www.petri.co.il/forums/showthread'):
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type= True
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = 'http://www.petri.co.il/forums/' + self.soup.find('a',text='&gt;').parent['href']
                 except:
                     log.info(self.log_msg('Next page not set'))
                     break
                 if not self.__setSoup():
                     log.info(self.log_msg('cannot continue'))
                     break
             return True
         else:
             self.total_posts_count = 0
             self.last_timestamp = datetime( 1980,1,1 )
             try:
                 self.max_posts_count = int(tg.config.get(path='Connector',key='petri_max_threads_count'))
             except:
                 log.info(self.log_msg('max therads count not set'))
                 self.max_posts_count=5
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             while True:
                 if not self.__getThreads():
                     break
                 try:
                     self.currenturi = 'http://www.petri.co.il/forums/' + self.soup.find('a',text='&gt;').parent['href']
                     if not self.__setSoup():
                         break
                 except:
                     log.info(self.log_msg('Next Page link not found'))
                     break
             #self.linksOut=None
             if self.linksOut:
                 updateSessionInfo('Search', self.session_info_out,self.last_timestamp , None,'ForumThreadsPage', self.task.instance_data.get('update'))
             return True
     except:
         log.exception(self.log_msg('Exception in fetch'))
         return False

开发者ID:jsyadav，项目名称:CrawlerFramework，代码行数:51，代码来源:petriconnector.py

示例20: fetch

 def fetch(self):
     """
     Fetch of
     """
     self.genre="Review"
     try:
         self.parent_uri = self.currenturi
         rcheck = re.compile(r'\d+-',re.U)
         
         #if self.currenturi.startswith('http://talk.collegeconfidential.com/college-admissions/'):
         if rcheck.search(self.currenturi):
             if not self.__setSoup():
                 log.info(self.log_msg('Soup not set , Returning False from Fetch'))
                 return False
             self.__getParentPage()
             self.post_type= True
             while True:
                 self.__addPosts()
                 try:
                     self.currenturi = '' + self.soup.find('a',text='&gt;').parent['href']
                 except:
                     log.info(self.log_msg('Next page not set'))
                     break

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python shortcuts.error_page函数代码示例发布时间：2022-05-26

Python sessioninfomanager.checkSessionInfo函数代码示例发布时间：2022-05-26

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13951|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10293|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4183|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4065|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3891|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3540|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3071|2022-01-22

8 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2735|2022-01-22

9 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2683|2022-05-25

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2351|2022-01-22

客服电话

电子邮件

Python sessioninfomanager.updateSessionInfo函数代码示例

示例1: __createTasksForThreads

示例2: __createTasksForThreads

示例3: __createTasksForThreads

示例4: __createTasksForThreads

示例5: __createTasksForThreads

示例6: __createTasksForThreads

示例7: __createTasksForThreads

示例8: fetch

示例9: __createTasksForThreads

示例10: __createTasksForThreads

示例11: __createTasksForThreads

示例12: fetch

示例13: fetch

示例14: fetch

示例15: fetch

示例16: fetch

示例17: fetch

示例18: fetch

示例19: fetch

示例20: fetch

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053