• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Java NoMoreDataException类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException的典型用法代码示例。如果您正苦于以下问题:Java NoMoreDataException类的具体用法?Java NoMoreDataException怎么用?Java NoMoreDataException使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



NoMoreDataException类属于org.apache.lucene.benchmark.byTask.feeds包,在下文中一共展示了NoMoreDataException类的10个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public synchronized DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  String[] tuple = parser.next();
  docData.clear();
  docData.setName(tuple[ID]);
  docData.setBody(tuple[TITLE] + " " + tuple[BODY]);
  docData.setDate(tuple[DATE]);
  docData.setTitle(tuple[TITLE]);
  /*
   *  TODO: @leo This is not a real URL, maybe we will need a real URL some day.
   *             This should be fine for sorting purposes, though. If the input
   *             is unsorted and we want to produce sorted document ids,
   *             this is just fine.
   */
  Properties props = new Properties();
  props.put("url", tuple[TITLE]);
  docData.setProps(props); 
  return docData;
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:20,代码来源:EnwikiContentSource.java


示例2: extract

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
public void extract() throws Exception {
  Document doc = null;
  System.out.println("Starting Extraction");
  long start = System.currentTimeMillis();
  try {
    while ((doc = docMaker.makeDocument()) != null) {
      create(doc.get(DocMaker.ID_FIELD), doc.get(DocMaker.TITLE_FIELD), doc
          .get(DocMaker.DATE_FIELD), doc.get(DocMaker.BODY_FIELD));
    }
  } catch (NoMoreDataException e) {
    //continue
  }
  long finish = System.currentTimeMillis();
  System.out.println("Extraction took " + (finish - start) + " ms");
}
 
开发者ID:europeana,项目名称:search,代码行数:16,代码来源:ExtractWikipedia.java


示例3: read

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
/**
 * Read until a line starting with the specified <code>lineStart</code>.
 * @param buf buffer for collecting the data if so specified/ 
 * @param lineStart line start to look for, must not be null.
 * @param collectMatchLine whether to collect the matching line into <code>buffer</code>.
 * @param collectAll whether to collect all lines into <code>buffer</code>.
 * @throws IOException If there is a low-level I/O error.
 * @throws NoMoreDataException If the source is exhausted.
 */
 private void read(StringBuilder buf, String lineStart, 
     boolean collectMatchLine, boolean collectAll) throws IOException, NoMoreDataException {
  String sep = "";
  while (true) {
    String line = reader.readLine();

    if (line == null) {
      openNextFile();
      continue;
    }

    line.length();

    if (lineStart!=null && line.startsWith(lineStart)) {
      if (collectMatchLine) {
        buf.append(sep).append(line);
        sep = NEW_LINE;
      }
      return;
    }

    if (collectAll) {
      buf.append(sep).append(line);
      sep = NEW_LINE;
    }
  }
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:37,代码来源:TrecContentSource.java


示例4: openNextFile

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
void openNextFile() throws NoMoreDataException, IOException {
  close();
  currPathType = null;
  while (true) {
    if (nextFile >= inputFiles.size()) { 
      // exhausted files, start a new round, unless forever set to false.
      if (!forever) {
        throw new NoMoreDataException();
      }
      nextFile = 0;
      iteration++;
    }
    File f = inputFiles.get(nextFile++);
    if (verbose) {
      System.out.println("opening: " + f + " length: " + f.length());
    }
    try {
      InputStream inputStream = StreamUtils.inputStream(f); // support either gzip, bzip2, or regular text file, by extension  
      reader = new BufferedReader(new InputStreamReader(inputStream, encoding), StreamUtils.BUFFER_SIZE);
      currPathType = TrecDocParser.pathType(f);
      return;
    } catch (Exception e) {
      if (verbose) {
        System.out.println("Skipping 'bad' file " + f.getAbsolutePath()+" due to "+e.getMessage());
        continue;
      }
      throw new NoMoreDataException();
    }
  }
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:31,代码来源:TrecContentSource.java


示例5: next

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
String[] next() throws NoMoreDataException {
  if (t == null) {
    threadDone = false;
    t = new Thread(this);
    t.setDaemon(true);
    t.start();
  }
  String[] result;
  synchronized(this){
    while(tuple == null && nmde == null && !threadDone && !stopped) {
      try {
        wait();
      } catch (InterruptedException ie) {
        throw new ThreadInterruptedException(ie);
      }
    }
    if (tuple != null) {
      result = tuple;
      tuple = null;
      notify();
      return result;
    }
    if (nmde != null) {
      // Set to null so we will re-start thread in case
      // we are re-used:
      t = null;
      throw nmde;
    }
    // The thread has exited yet did not hit end of
    // data, so this means it hit an exception.  We
    // throw NoMorDataException here to force
    // benchmark to stop the current alg:
    throw new NoMoreDataException();
  }
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:36,代码来源:EnwikiContentSource.java


示例6: openNextFile

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
void openNextFile() throws NoMoreDataException, IOException {
  close();

  while (true) {
    if (nextFile >= inputFiles.size()) { 
      // exhausted files, start a new round, unless forever set to false.
      if (!forever) {
        throw new NoMoreDataException();
      }
      nextFile = 0;
      iteration++;
    }
    File f = inputFiles.get(nextFile++);
    if (verbose) {
      System.out.println("opening: " + f + " length: " + f.length());
    }
    try {
      // supports gzip, bzip2, or regular text file, extension is used to detect
      InputStream inputStream = StreamUtils.inputStream(f);   
      reader = new DataInputStream(inputStream);
      return;
    } catch (Exception e) {
      if (verbose) {
        System.out.println("Skipping 'bad' file " + f.getAbsolutePath()+" due to "+e.getMessage());
        continue;
      }
      throw new NoMoreDataException();
    }
  }
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:31,代码来源:ClueWeb09ContentSource.java


示例7: doSerialTasksWithRate

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
private int doSerialTasksWithRate() throws Exception {
  initTasksArray();
  long delayStep = (perMin ? 60000 : 1000) /rate;
  long nextStartTime = System.currentTimeMillis();
  int count = 0;
  final long t0 = System.currentTimeMillis();
  for (int k=0; (repetitions==REPEAT_EXHAUST && !exhausted) || k<repetitions; k++) {
    if (stopNow) {
      break;
    }
    for (int l=0;l<tasksArray.length;l++) {
      final PerfTask task = tasksArray[l];
      while(!stopNow) {
        long waitMore = nextStartTime - System.currentTimeMillis();
        if (waitMore > 0) {
          // TODO: better to use condition to notify
          Thread.sleep(1);
        } else {
          break;
        }
      }
      if (stopNow) {
        break;
      }
      nextStartTime += delayStep; // this aims at avarage rate. 
      try {
        final int inc = task.runAndMaybeStats(letChildReport);
        count += inc;
        if (countsByTime != null) {
          final int slot = (int) ((System.currentTimeMillis()-t0)/logByTimeMsec);
          if (slot >= countsByTime.length) {
            countsByTime = ArrayUtil.grow(countsByTime, 1+slot);
          }
          countsByTime[slot] += inc;
        }

        if (anyExhaustibleTasks)
          updateExhausted(task);
      } catch (NoMoreDataException e) {
        exhausted = true;
      }
    }
  }
  stopNow = false;
  return count;
}
 
开发者ID:europeana,项目名称:search,代码行数:47,代码来源:TaskSequence.java


示例8: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData)
    throws NoMoreDataException, IOException {
  return docData;
}
 
开发者ID:europeana,项目名称:search,代码行数:6,代码来源:TestPerfTasksParse.java


示例9: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  String name = null;
  StringBuilder docBuf = getDocBuffer();
  ParsePathType parsedPathType;
  
  // protect reading from the TREC files by multiple threads. The rest of the
  // method, i.e., parsing the content and returning the DocData can run unprotected.
  synchronized (lock) {
    if (reader == null) {
      openNextFile();
    }
    
    // 1. skip until doc start - required for all TREC formats
    docBuf.setLength(0);
    read(docBuf, DOC, false, false);
    
    // save parsedFile for passing trecDataParser after the sync block, in 
    // case another thread will open another file in between.
    parsedPathType = currPathType;
    
    // 2. name - required for all TREC formats
    docBuf.setLength(0);
    read(docBuf, DOCNO, true, false);
    name = docBuf.substring(DOCNO.length(), docBuf.indexOf(TERMINATING_DOCNO,
        DOCNO.length())).trim();
    
    if (!excludeDocnameIteration) {
      name = name + "_" + iteration;
    }

    // 3. read all until end of doc
    docBuf.setLength(0);
    read(docBuf, TERMINATING_DOC, false, true);
  }
    
  // count char length of text to be parsed (may be larger than the resulted plain doc body text).
  addBytes(docBuf.length()); 

  // This code segment relies on HtmlParser being thread safe. When we get 
  // here, everything else is already private to that thread, so we're safe.
  docData = trecDocParser.parse(docData, name, this, docBuf, parsedPathType);
  addItem();

  return docData;
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:47,代码来源:TrecContentSource.java


示例10: getNextDocData

import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
  WarcRecord  CurrRec = null;
  
  // protect reading from the TREC files by multiple threads. The rest of the
  // method, i.e., parsing the content and returning the DocData can run unprotected.
  synchronized (lock) {
    if (reader == null) {
      openNextFile();
    }
    
    do {
      CurrRec = WarcRecord.readNextWarcRecord(reader);
      /*
       *  We need to skip special auxiliary entries, e.g., in the
       *  beginning of the file.
       */
      
    } while (CurrRec != null && !CurrRec.getHeaderRecordType().equals("response"));
    
    if (CurrRec == null) {
      openNextFile();
      return getNextDocData(docData);
    }
  }
      
 
  Date    date = parseDate(CurrRec.getHeaderMetadataItem("WARC-Date"));    
  String  url = CurrRec.getHeaderMetadataItem("WARC-Target-URI");
    
  // This code segment relies on HtmlParser being thread safe. When we get 
  // here, everything else is already private to that thread, so we're safe.
  if (url.startsWith("http://") || 
      url.startsWith("ftp://") ||
      url.startsWith("https://")
      ) {          
    String Response = CurrRec.getContentUTF8();

    int EndOfHead = Response.indexOf("\n\n");
    
    if (EndOfHead >= 0) {
      String html = Response.substring(EndOfHead + 2);

      Properties props = new Properties();
              
      docData = htmlParser.parse(docData, url, date, new StringReader(html), this);
   // This should be done after parse(), b/c parse() resets properties
      docData.getProps().put("url", url);
    } else {
      /*
       *  TODO: @leo What do we do here exactly? 
       *  The interface doesn't allow us to signal that an entry should be skipped. 
       */    
      System.err.println("Cannot extract HTML in URI: " + url);          
    }
  } else {
    /*
     *  TODO: @leo What do we do here exactly? 
     *  The interface doesn't allow us to signal that an entry should be skipped. 
     */    
    System.err.println("Ignoring schema in URI: " + url);  
  }

  addItem();

  return docData;
}
 
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:68,代码来源:ClueWeb09ContentSource.java



注:本文中的org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java Exceptions类代码示例发布时间:2022-05-23
下一篇:
Java DriveContentsResult类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap