本文整理汇总了Java中org.apache.tika.parser.pdf.PDFParser类的典型用法代码示例。如果您正苦于以下问题:Java PDFParser类的具体用法?Java PDFParser怎么用?Java PDFParser使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
PDFParser类属于org.apache.tika.parser.pdf包,在下文中一共展示了PDFParser类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: fromFile
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
@Override
public String fromFile(File file) {
String resultText = "";
try {
FileInputStream inputstream = new FileInputStream(file);
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
PDFParserConfig config = new PDFParserConfig();
config.setSortByPosition(true);
PDFParser pdfparser = new PDFParser();
pdfparser.setPDFParserConfig(config);
System.out.println("Parsing PDF to TEXT...");
pdfparser.parse(inputstream, handler, metadata, pcontext);
resultText = handler.toString();
System.out.println("Parsing complete");
} catch (Exception ex) {
throw new RuntimeException(ex);
}
return resultText;
}
开发者ID:eduardohmg,项目名称:diario-extractor,代码行数:29,代码来源:PDFToTextImpl.java
示例2: parse
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
private static String parse(final InputStream input) throws TikaException, SAXException, IOException {
final Parser parser = new PDFParser();
final ContentHandler handler = new BodyContentHandler();
final Metadata metadata = new Metadata();
final ParseContext parseContext = new ParseContext();
parser.parse(input, handler, metadata, parseContext);
return handler.toString();
}
开发者ID:tnovo,项目名称:which-food-uptec-cli,代码行数:11,代码来源:App.java
示例3: importData
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
@POST
@Path("import")
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Produces(MediaType.APPLICATION_JSON)
public Response importData(@DefaultValue("true") @FormDataParam("enabled") boolean enabled,
@FormDataParam("file") InputStream inputStream,
@FormDataParam("file") FormDataContentDisposition fileDetail, @Context UriInfo uriInfo)
throws JSONException, IOException, SAXException, TikaException {
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
// parsing the document using PDF parser
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputStream, handler, metadata, pcontext);
String sentences[] = turOpenNLPConnector.sentenceDetect(handler.toString());
TurData turData = new TurData();
turData.setName(fileDetail.getFileName());
turData.setType(FilenameUtils.getExtension(fileDetail.getFileName()));
this.turDataRepository.save(turData);
for (String sentence : sentences) {
TurDataGroupSentence turDataGroupSentence = new TurDataGroupSentence();
turDataGroupSentence.setTurData(turData);
turDataGroupSentence.setSentence(sentence);
this.turDataGroupSentenceRepository.save(turDataGroupSentence);
}
JSONObject jsonTraining = new JSONObject();
jsonTraining.put("sentences", sentences);
return Response.status(200).entity(jsonTraining.toString()).build();
}
开发者ID:openviglet,项目名称:turing,代码行数:39,代码来源:TurMLDataAPI.java
示例4: getParser
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
@Override
protected Parser getParser() {
return new PDFParser();
}
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:5,代码来源:PdfBoxContentTransformer.java
示例5: getParser
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
@Override
protected Parser getParser()
{
return new PDFParser();
}
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:6,代码来源:PdfBoxMetadataExtracter.java
示例6: importData
import org.apache.tika.parser.pdf.PDFParser; //导入依赖的package包/类
@POST
@Path("import")
@Consumes(MediaType.MULTIPART_FORM_DATA)
@Produces(MediaType.APPLICATION_JSON)
public TurDataGroupData importData(@PathParam("dataGroupId") int dataGroupId,
@DefaultValue("true") @FormDataParam("enabled") boolean enabled,
@FormDataParam("file") InputStream inputStream,
@FormDataParam("file") FormDataContentDisposition fileDetail, @Context UriInfo uriInfo)
throws JSONException, IOException, SAXException, TikaException {
TurDataGroup turDataGroup = this.turDataGroupRepository.getOne(dataGroupId);
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
// parsing the document using PDF parser
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputStream, handler, metadata, pcontext);
String sentences[] = turOpenNLPConnector.sentenceDetect(handler.toString());
TurData turData = new TurData();
turData.setName(fileDetail.getFileName());
turData.setType(FilenameUtils.getExtension(fileDetail.getFileName()));
this.turDataRepository.save(turData);
for (String sentence : sentences) {
TurDataGroupSentence turDataGroupSentence = new TurDataGroupSentence();
turDataGroupSentence.setTurData(turData);
turDataGroupSentence.setSentence(sentence);
turDataGroupSentence.setTurDataGroup(turDataGroup);
turDataGroupSentenceRepository.save(turDataGroupSentence);
}
TurDataGroupData turDataGroupData = new TurDataGroupData();
turDataGroupData.setTurData(turData);
turDataGroupData.setTurDataGroup(turDataGroup);
turDataGroupDataRepository.save(turDataGroupData);
return turDataGroupData;
}
开发者ID:openviglet,项目名称:turing,代码行数:45,代码来源:TurMLDataGroupDataAPI.java
注:本文中的org.apache.tika.parser.pdf.PDFParser类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论