本文整理汇总了Java中org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException类的典型用法代码示例。如果您正苦于以下问题:Java NoMoreDataException类的具体用法?Java NoMoreDataException怎么用?Java NoMoreDataException使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
NoMoreDataException类属于org.apache.lucene.benchmark.byTask.feeds包,在下文中一共展示了NoMoreDataException类的10个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: getNextDocData
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public synchronized DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
String[] tuple = parser.next();
docData.clear();
docData.setName(tuple[ID]);
docData.setBody(tuple[TITLE] + " " + tuple[BODY]);
docData.setDate(tuple[DATE]);
docData.setTitle(tuple[TITLE]);
/*
* TODO: @leo This is not a real URL, maybe we will need a real URL some day.
* This should be fine for sorting purposes, though. If the input
* is unsorted and we want to produce sorted document ids,
* this is just fine.
*/
Properties props = new Properties();
props.put("url", tuple[TITLE]);
docData.setProps(props);
return docData;
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:20,代码来源:EnwikiContentSource.java
示例2: extract
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
public void extract() throws Exception {
Document doc = null;
System.out.println("Starting Extraction");
long start = System.currentTimeMillis();
try {
while ((doc = docMaker.makeDocument()) != null) {
create(doc.get(DocMaker.ID_FIELD), doc.get(DocMaker.TITLE_FIELD), doc
.get(DocMaker.DATE_FIELD), doc.get(DocMaker.BODY_FIELD));
}
} catch (NoMoreDataException e) {
//continue
}
long finish = System.currentTimeMillis();
System.out.println("Extraction took " + (finish - start) + " ms");
}
开发者ID:europeana,项目名称:search,代码行数:16,代码来源:ExtractWikipedia.java
示例3: read
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
/**
* Read until a line starting with the specified <code>lineStart</code>.
* @param buf buffer for collecting the data if so specified/
* @param lineStart line start to look for, must not be null.
* @param collectMatchLine whether to collect the matching line into <code>buffer</code>.
* @param collectAll whether to collect all lines into <code>buffer</code>.
* @throws IOException If there is a low-level I/O error.
* @throws NoMoreDataException If the source is exhausted.
*/
private void read(StringBuilder buf, String lineStart,
boolean collectMatchLine, boolean collectAll) throws IOException, NoMoreDataException {
String sep = "";
while (true) {
String line = reader.readLine();
if (line == null) {
openNextFile();
continue;
}
line.length();
if (lineStart!=null && line.startsWith(lineStart)) {
if (collectMatchLine) {
buf.append(sep).append(line);
sep = NEW_LINE;
}
return;
}
if (collectAll) {
buf.append(sep).append(line);
sep = NEW_LINE;
}
}
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:37,代码来源:TrecContentSource.java
示例4: openNextFile
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
void openNextFile() throws NoMoreDataException, IOException {
close();
currPathType = null;
while (true) {
if (nextFile >= inputFiles.size()) {
// exhausted files, start a new round, unless forever set to false.
if (!forever) {
throw new NoMoreDataException();
}
nextFile = 0;
iteration++;
}
File f = inputFiles.get(nextFile++);
if (verbose) {
System.out.println("opening: " + f + " length: " + f.length());
}
try {
InputStream inputStream = StreamUtils.inputStream(f); // support either gzip, bzip2, or regular text file, by extension
reader = new BufferedReader(new InputStreamReader(inputStream, encoding), StreamUtils.BUFFER_SIZE);
currPathType = TrecDocParser.pathType(f);
return;
} catch (Exception e) {
if (verbose) {
System.out.println("Skipping 'bad' file " + f.getAbsolutePath()+" due to "+e.getMessage());
continue;
}
throw new NoMoreDataException();
}
}
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:31,代码来源:TrecContentSource.java
示例5: next
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
String[] next() throws NoMoreDataException {
if (t == null) {
threadDone = false;
t = new Thread(this);
t.setDaemon(true);
t.start();
}
String[] result;
synchronized(this){
while(tuple == null && nmde == null && !threadDone && !stopped) {
try {
wait();
} catch (InterruptedException ie) {
throw new ThreadInterruptedException(ie);
}
}
if (tuple != null) {
result = tuple;
tuple = null;
notify();
return result;
}
if (nmde != null) {
// Set to null so we will re-start thread in case
// we are re-used:
t = null;
throw nmde;
}
// The thread has exited yet did not hit end of
// data, so this means it hit an exception. We
// throw NoMorDataException here to force
// benchmark to stop the current alg:
throw new NoMoreDataException();
}
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:36,代码来源:EnwikiContentSource.java
示例6: openNextFile
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
void openNextFile() throws NoMoreDataException, IOException {
close();
while (true) {
if (nextFile >= inputFiles.size()) {
// exhausted files, start a new round, unless forever set to false.
if (!forever) {
throw new NoMoreDataException();
}
nextFile = 0;
iteration++;
}
File f = inputFiles.get(nextFile++);
if (verbose) {
System.out.println("opening: " + f + " length: " + f.length());
}
try {
// supports gzip, bzip2, or regular text file, extension is used to detect
InputStream inputStream = StreamUtils.inputStream(f);
reader = new DataInputStream(inputStream);
return;
} catch (Exception e) {
if (verbose) {
System.out.println("Skipping 'bad' file " + f.getAbsolutePath()+" due to "+e.getMessage());
continue;
}
throw new NoMoreDataException();
}
}
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:31,代码来源:ClueWeb09ContentSource.java
示例7: doSerialTasksWithRate
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
private int doSerialTasksWithRate() throws Exception {
initTasksArray();
long delayStep = (perMin ? 60000 : 1000) /rate;
long nextStartTime = System.currentTimeMillis();
int count = 0;
final long t0 = System.currentTimeMillis();
for (int k=0; (repetitions==REPEAT_EXHAUST && !exhausted) || k<repetitions; k++) {
if (stopNow) {
break;
}
for (int l=0;l<tasksArray.length;l++) {
final PerfTask task = tasksArray[l];
while(!stopNow) {
long waitMore = nextStartTime - System.currentTimeMillis();
if (waitMore > 0) {
// TODO: better to use condition to notify
Thread.sleep(1);
} else {
break;
}
}
if (stopNow) {
break;
}
nextStartTime += delayStep; // this aims at avarage rate.
try {
final int inc = task.runAndMaybeStats(letChildReport);
count += inc;
if (countsByTime != null) {
final int slot = (int) ((System.currentTimeMillis()-t0)/logByTimeMsec);
if (slot >= countsByTime.length) {
countsByTime = ArrayUtil.grow(countsByTime, 1+slot);
}
countsByTime[slot] += inc;
}
if (anyExhaustibleTasks)
updateExhausted(task);
} catch (NoMoreDataException e) {
exhausted = true;
}
}
}
stopNow = false;
return count;
}
开发者ID:europeana,项目名称:search,代码行数:47,代码来源:TaskSequence.java
示例8: getNextDocData
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData)
throws NoMoreDataException, IOException {
return docData;
}
开发者ID:europeana,项目名称:search,代码行数:6,代码来源:TestPerfTasksParse.java
示例9: getNextDocData
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
String name = null;
StringBuilder docBuf = getDocBuffer();
ParsePathType parsedPathType;
// protect reading from the TREC files by multiple threads. The rest of the
// method, i.e., parsing the content and returning the DocData can run unprotected.
synchronized (lock) {
if (reader == null) {
openNextFile();
}
// 1. skip until doc start - required for all TREC formats
docBuf.setLength(0);
read(docBuf, DOC, false, false);
// save parsedFile for passing trecDataParser after the sync block, in
// case another thread will open another file in between.
parsedPathType = currPathType;
// 2. name - required for all TREC formats
docBuf.setLength(0);
read(docBuf, DOCNO, true, false);
name = docBuf.substring(DOCNO.length(), docBuf.indexOf(TERMINATING_DOCNO,
DOCNO.length())).trim();
if (!excludeDocnameIteration) {
name = name + "_" + iteration;
}
// 3. read all until end of doc
docBuf.setLength(0);
read(docBuf, TERMINATING_DOC, false, true);
}
// count char length of text to be parsed (may be larger than the resulted plain doc body text).
addBytes(docBuf.length());
// This code segment relies on HtmlParser being thread safe. When we get
// here, everything else is already private to that thread, so we're safe.
docData = trecDocParser.parse(docData, name, this, docBuf, parsedPathType);
addItem();
return docData;
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:47,代码来源:TrecContentSource.java
示例10: getNextDocData
import org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException; //导入依赖的package包/类
@Override
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
WarcRecord CurrRec = null;
// protect reading from the TREC files by multiple threads. The rest of the
// method, i.e., parsing the content and returning the DocData can run unprotected.
synchronized (lock) {
if (reader == null) {
openNextFile();
}
do {
CurrRec = WarcRecord.readNextWarcRecord(reader);
/*
* We need to skip special auxiliary entries, e.g., in the
* beginning of the file.
*/
} while (CurrRec != null && !CurrRec.getHeaderRecordType().equals("response"));
if (CurrRec == null) {
openNextFile();
return getNextDocData(docData);
}
}
Date date = parseDate(CurrRec.getHeaderMetadataItem("WARC-Date"));
String url = CurrRec.getHeaderMetadataItem("WARC-Target-URI");
// This code segment relies on HtmlParser being thread safe. When we get
// here, everything else is already private to that thread, so we're safe.
if (url.startsWith("http://") ||
url.startsWith("ftp://") ||
url.startsWith("https://")
) {
String Response = CurrRec.getContentUTF8();
int EndOfHead = Response.indexOf("\n\n");
if (EndOfHead >= 0) {
String html = Response.substring(EndOfHead + 2);
Properties props = new Properties();
docData = htmlParser.parse(docData, url, date, new StringReader(html), this);
// This should be done after parse(), b/c parse() resets properties
docData.getProps().put("url", url);
} else {
/*
* TODO: @leo What do we do here exactly?
* The interface doesn't allow us to signal that an entry should be skipped.
*/
System.err.println("Cannot extract HTML in URI: " + url);
}
} else {
/*
* TODO: @leo What do we do here exactly?
* The interface doesn't allow us to signal that an entry should be skipped.
*/
System.err.println("Ignoring schema in URI: " + url);
}
addItem();
return docData;
}
开发者ID:searchivarius,项目名称:IndexTextCollect,代码行数:68,代码来源:ClueWeb09ContentSource.java
注:本文中的org.apache.lucene.benchmark.byTask.feeds.NoMoreDataException类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论