Java DocData类代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.apache.lucene.benchmark.byTask.feeds.DocData类的典型用法代码示例。如果您正苦于以下问题：Java DocData类的具体用法？Java DocData怎么用？Java DocData使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

DocData类属于org.apache.lucene.benchmark.byTask.feeds包，在下文中一共展示了DocData类的9个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public synchronized DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  String[] tuple = parser.next();
  docData.clear();
  docData.setName(tuple[ID]);
  docData.setBody(tuple[TITLE] + " " + tuple[BODY]);
  docData.setDate(tuple[DATE]);
  docData.setTitle(tuple[TITLE]);
  /*
   *  TODO: @leo This is not a real URL, maybe we will need a real URL some day.
   *             This should be fine for sorting purposes, though. If the input
   *             is unsorted and we want to produce sorted document ids,
   *             this is just fine.
   */
  Properties props = new Properties();
  props.put("url", tuple[TITLE]);
  docData.setProps(props); 
  return docData;
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:20，代码来源:EnwikiContentSource.java

示例2: parse

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData parse(DocData docData, String name, Date date, Reader reader, 
                     ContentSourceDateUtil trecSrc) throws IOException {
  try {
    return parse(docData, name, date, new InputSource(reader), trecSrc);
  } catch (SAXException saxe) {
    throw new IOException("SAX exception occurred while parsing HTML document.", saxe);
  }
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:10，代码来源:DemoHTMLParser.java

示例3: parse

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData parse(DocData docData, 
                     String name, 
                     Date date, 
                     Reader reader, 
                     ContentSourceDateUtil trecSrc) throws IOException {

  return parse(docData, name, date, new InputSource(reader), trecSrc);
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:10，代码来源:LeoHTMLParser.java

示例4: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData)
    throws NoMoreDataException, IOException {
  return docData;
}

开发者ID:europeana，项目名称:search，代码行数:6，代码来源:TestPerfTasksParse.java

示例5: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  String name = null;
  StringBuilder docBuf = getDocBuffer();
  ParsePathType parsedPathType;
  
  // protect reading from the TREC files by multiple threads. The rest of the
  // method, i.e., parsing the content and returning the DocData can run unprotected.
  synchronized (lock) {
    if (reader == null) {
      openNextFile();
    }
    
    // 1. skip until doc start - required for all TREC formats
    docBuf.setLength(0);
    read(docBuf, DOC, false, false);
    
    // save parsedFile for passing trecDataParser after the sync block, in 
    // case another thread will open another file in between.
    parsedPathType = currPathType;
    
    // 2. name - required for all TREC formats
    docBuf.setLength(0);
    read(docBuf, DOCNO, true, false);
    name = docBuf.substring(DOCNO.length(), docBuf.indexOf(TERMINATING_DOCNO,
        DOCNO.length())).trim();
    
    if (!excludeDocnameIteration) {
      name = name + "_" + iteration;
    }

    // 3. read all until end of doc
    docBuf.setLength(0);
    read(docBuf, TERMINATING_DOC, false, true);
  }
    
  // count char length of text to be parsed (may be larger than the resulted plain doc body text).
  addBytes(docBuf.length()); 

  // This code segment relies on HtmlParser being thread safe. When we get 
  // here, everything else is already private to that thread, so we're safe.
  docData = trecDocParser.parse(docData, name, this, docBuf, parsedPathType);
  addItem();

  return docData;
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:47，代码来源:TrecContentSource.java

示例6: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  WarcRecord  CurrRec = null;
  
  // protect reading from the TREC files by multiple threads. The rest of the
  // method, i.e., parsing the content and returning the DocData can run unprotected.
  synchronized (lock) {
    if (reader == null) {
      openNextFile();
    }
    
    do {
      CurrRec = WarcRecord.readNextWarcRecord(reader);
      /*
       *  We need to skip special auxiliary entries, e.g., in the
       *  beginning of the file.
       */
      
    } while (CurrRec != null && !CurrRec.getHeaderRecordType().equals("response"));
    
    if (CurrRec == null) {
      openNextFile();
      return getNextDocData(docData);
    }
  }
      
 
  Date    date = parseDate(CurrRec.getHeaderMetadataItem("WARC-Date"));    
  String  url = CurrRec.getHeaderMetadataItem("WARC-Target-URI");
    
  // This code segment relies on HtmlParser being thread safe. When we get 
  // here, everything else is already private to that thread, so we're safe.
  if (url.startsWith("http://") || 
      url.startsWith("ftp://") ||
      url.startsWith("https://")
      ) {          
    String Response = CurrRec.getContentUTF8();

    int EndOfHead = Response.indexOf("\n\n");
    
    if (EndOfHead >= 0) {
      String html = Response.substring(EndOfHead + 2);

      Properties props = new Properties();
              
      docData = htmlParser.parse(docData, url, date, new StringReader(html), this);
   // This should be done after parse(), b/c parse() resets properties
      docData.getProps().put("url", url);
    } else {
      /*
       *  TODO: @leo What do we do here exactly? 
       *  The interface doesn't allow us to signal that an entry should be skipped. 
       */    
      System.err.println("Cannot extract HTML in URI: " + url);          
    }
  } else {
    /*
     *  TODO: @leo What do we do here exactly? 
     *  The interface doesn't allow us to signal that an entry should be skipped. 
     */    
    System.err.println("Ignoring schema in URI: " + url);  
  }

  addItem();

  return docData;
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:68，代码来源:ClueWeb09ContentSource.java

示例7: parse

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
@Override
public DocData parse(DocData docData, String name, TrecContentSource trecSrc, 
    StringBuilder docBuf, ParsePathType pathType) throws IOException {
  // skip some of the non-html text, optionally set date
  Date date = null;
  int start = 0;
  final int h1 = docBuf.indexOf(DOCHDR);
  if (h1 >= 0) {
    final int hStart2dLine = h1 + DOCHDR.length() + 1;
    final int hEnd2dLine = docBuf.indexOf("\n", hStart2dLine);
    
    if (hEnd2dLine >= 0) {
      String url = docBuf.substring(hStart2dLine, hEnd2dLine)
                                    .toLowerCase().trim();
      
      if (url.startsWith("http://") || 
          url.startsWith("ftp://") ||
          url.startsWith("https://")
          ) {          
        final int h2 = docBuf.indexOf(TERMINATING_DOCHDR, h1);
        final String dateStr = extract(docBuf, DATE, DATE_END, h2, null);
        if (dateStr != null) {
          date = trecSrc.parseDate(dateStr);
        }
        start = h2 + TERMINATING_DOCHDR.length();
        
        final String html = docBuf.substring(start);
        docData = trecSrc.getHtmlParser().parse(docData, name, date, new StringReader(html), trecSrc);
        // This should be done after parse(), b/c parse() resets properties
        docData.getProps().put("url", url);
        return docData;
    } else {
      System.err.println("Ignoring schema in URI: " + url);  
    }
    } else {
      System.err.println("Invalid header: " + docBuf.toString());
    }
  }
  
  /*
   *  TODO: @leo What do we do here exactly? 
   *  The interface doesn't allow us to signal that an entry should be skipped. 
   */    
  
  return docData;
}

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:47，代码来源:TrecGov2Parser.java

示例8: parse

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
/** 
 * parse the text prepared in docBuf into a result DocData, 
 * no synchronization is required.
 * @param docData reusable result
 * @param name name that should be set to the result
 * @param trecSrc calling trec content source  
 * @param docBuf text to parse  
 * @param pathType type of parsed file, or null if unknown - may be used by 
 * parsers to alter their behavior according to the file path type. 
 */  
public abstract DocData parse(DocData docData, String name, TrecContentSource trecSrc, 
    StringBuilder docBuf, ParsePathType pathType) throws IOException;

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:13，代码来源:TrecDocParser.java

示例9: parse

import org.apache.lucene.benchmark.byTask.feeds.DocData; //导入依赖的package包/类
/**
 * Parse the input Reader and return DocData. 
 * The provided name,title,date are used for the result, unless when they're null, 
 * in which case an attempt is made to set them from the parsed data.
 * @param docData result reused
 * @param name name of the result doc data.
 * @param date date of the result doc data. If null, attempt to set by parsed data.
 * @param reader reader of html text to parse.
 * @param trecSrc the {@link ContentSourceDateUtil} used to parse dates.   
 * @return Parsed doc data.
 * @throws IOException If there is a low-level I/O error.
 */
public DocData parse(DocData docData, String name, Date date, Reader reader, ContentSourceDateUtil trecSrc) throws IOException;

开发者ID:searchivarius，项目名称:IndexTextCollect，代码行数:14，代码来源:HTMLParser.java

注：本文中的org.apache.lucene.benchmark.byTask.feeds.DocData类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java OWLClassExpressionVisitor类代码示例发布时间：2022-05-22

Java BaseRepositoryEditor类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：19189|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9988|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8326|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8696|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8639|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9657|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8624|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7998|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8656|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7535|2022-11-06

客服电话

电子邮件

Java DocData类代码示例

示例1: getNextDocData

示例2: parse

示例3: parse

示例4: getNextDocData

示例5: getNextDocData

示例6: getNextDocData

示例7: parse

示例8: parse

示例9: parse

请发表评论

全部评论

上一篇：

下一篇：

chasinginfinity/ml-from-scratch: Machine

DELPHI根据进程名强制关闭进程

ravikumar001/maven

床的笔顺,关于床的笔画,体会床的部首

wolfie1910/MastodonReadBOT: Mastodon BOT

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053