• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

Java TikaCoreProperties类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.apache.tika.metadata.TikaCoreProperties的典型用法代码示例。如果您正苦于以下问题:Java TikaCoreProperties类的具体用法?Java TikaCoreProperties怎么用?Java TikaCoreProperties使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



TikaCoreProperties类属于org.apache.tika.metadata包,在下文中一共展示了TikaCoreProperties类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: handle

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
public void handle(Directory directory, Metadata metadata)
        throws MetadataException {
    if (directory.containsTag(IptcDirectory.TAG_KEYWORDS)) {
        String[] keywords = directory.getStringArray(IptcDirectory.TAG_KEYWORDS);
        for (String k : keywords) {
            metadata.add(TikaCoreProperties.KEYWORDS, k);
        }
    }
    if (directory.containsTag(IptcDirectory.TAG_HEADLINE)) {
        metadata.set(TikaCoreProperties.TITLE, directory.getString(IptcDirectory.TAG_HEADLINE));
    } else if (directory.containsTag(IptcDirectory.TAG_OBJECT_NAME)) {
        metadata.set(TikaCoreProperties.TITLE, directory.getString(IptcDirectory.TAG_OBJECT_NAME));
    }
    if (directory.containsTag(IptcDirectory.TAG_BY_LINE)) {
        metadata.set(TikaCoreProperties.CREATOR, directory.getString(IptcDirectory.TAG_BY_LINE));
        metadata.set(IPTC.CREATOR, directory.getString(IptcDirectory.TAG_BY_LINE));
    }
    if (directory.containsTag(IptcDirectory.TAG_CAPTION)) {
        metadata.set(TikaCoreProperties.DESCRIPTION,
                // Looks like metadata extractor returns IPTC newlines as a single carriage return,
                // but the exiv2 command does not so we change to line feed here because that is less surprising to users                        
                directory.getString(IptcDirectory.TAG_CAPTION).replaceAll("\r\n?", "\n"));
    }
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:25,代码来源:ImageMetadataExtractor.java


示例2: parseEmbedded

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Override
public void parseEmbedded(final InputStream input, final ContentHandler handler, final Metadata metadata,
                          final boolean outputHtml) throws SAXException, IOException {

	// There's no need to spawn inline embeds, like images in PDFs. These should be concatenated to the main
	// document as usual.
	if (TikaCoreProperties.EmbeddedResourceType.INLINE.toString().equals(metadata
			.get(TikaCoreProperties.EMBEDDED_RESOURCE_TYPE))) {
		final ContentHandler embedHandler = new EmbeddedContentHandler(new BodyContentHandler(handler));

		if (outputHtml) {
			writeStart(handler, metadata);
		}

		delegateParsing(input, embedHandler, metadata);

		if (outputHtml) {
			writeEnd(handler);
		}
	} else {
		try (final TikaInputStream tis = TikaInputStream.get(input)) {
			spawnEmbedded(tis, metadata);
		}
	}
}
 
开发者ID:ICIJ,项目名称:extract,代码行数:26,代码来源:EmbedSpawner.java


示例3: parse

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
private void parse(DocumentSummaryInformation summary) {
  set(OfficeOpenXMLExtended.COMPANY, summary.getCompany());
  set(OfficeOpenXMLExtended.MANAGER, summary.getManager());
  set(TikaCoreProperties.LANGUAGE, getLanguage(summary));
  set(OfficeOpenXMLCore.CATEGORY, summary.getCategory());

  // New style counts
  set(Office.SLIDE_COUNT, summary.getSlideCount());
  if (summary.getSlideCount() > 0) {
    metadata.set(PagedText.N_PAGES, summary.getSlideCount());
  }
  // Old style, Tika 1.0 counts
  // TODO Remove these in Tika 2.0
  set(Metadata.COMPANY, summary.getCompany());
  set(Metadata.MANAGER, summary.getManager());
  set(MSOffice.SLIDE_COUNT, summary.getSlideCount());
  set(Metadata.CATEGORY, summary.getCategory());

  parse(summary.getCustomProperties());
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:21,代码来源:SummaryExtractor.java


示例4: resolveMetaDataKey

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
/**
 * Returns a resolved key that is common in other document types or returns
 * the specified metaDataLocalName if no common key could be found. The key
 * could be a simple String key, or could be a {@link Property}
 *
 * @param metaDataLocalName
 *          The localname of the element containing metadata
 * @return a resolved key that is common in other document types
 */
private Object resolveMetaDataKey(String metaDataLocalName) {
  Object metaDataKey = metaDataLocalName;
  if ("sf:authors".equals(metaDataQName)) {
    metaDataKey = TikaCoreProperties.CREATOR;
  } else if ("sf:title".equals(metaDataQName)) {
    metaDataKey = TikaCoreProperties.TITLE;
  } else if ("sl:SLCreationDateProperty".equals(metaDataQName)) {
    metaDataKey = TikaCoreProperties.CREATED;
  } else if ("sl:SLLastModifiedDateProperty".equals(metaDataQName)) {
    metaDataKey = Metadata.LAST_MODIFIED;
  } else if ("sl:language".equals(metaDataQName)) {
    metaDataKey = TikaCoreProperties.LANGUAGE;
  }
  return metaDataKey;
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:25,代码来源:PagesContentHandler.java


示例5: getContentHandler

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
protected ContentHandler getContentHandler(
        ContentHandler handler, Metadata metadata, ParseContext context) {
    return new TeeContentHandler(
            super.getContentHandler(handler, metadata, context),
            getDublinCoreHandler(metadata, TikaCoreProperties.TITLE, "title"),
            getDublinCoreHandler(metadata, TikaCoreProperties.KEYWORDS, "subject"),
            getDublinCoreHandler(metadata, TikaCoreProperties.CREATOR, "creator"),
            getDublinCoreHandler(metadata, TikaCoreProperties.DESCRIPTION, "description"),
            getDublinCoreHandler(metadata, TikaCoreProperties.PUBLISHER, "publisher"),
            getDublinCoreHandler(metadata, TikaCoreProperties.CONTRIBUTOR, "contributor"),
            getDublinCoreHandler(metadata, TikaCoreProperties.CREATED, "date"),
            getDublinCoreHandler(metadata, TikaCoreProperties.TYPE, "type"),
            getDublinCoreHandler(metadata, TikaCoreProperties.FORMAT, "format"),
            getDublinCoreHandler(metadata, TikaCoreProperties.IDENTIFIER, "identifier"),
            getDublinCoreHandler(metadata, TikaCoreProperties.LANGUAGE, "language"),
            getDublinCoreHandler(metadata, TikaCoreProperties.RIGHTS, "rights"));
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:18,代码来源:DcXMLParser.java


示例6: testWord

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
/**
 * Test the plain text output of the Word converter
 * @throws Exception
 */
@Test
public void testWord() throws Exception {
    Metadata metadata = new Metadata();
    ContentHandler handler = new BodyContentHandler();
    ParseContext context = new ParseContext();

    InputStream input = getTestDocument("testWORD.docx");
    try {
        parser.parse(input, handler, metadata, context);
        
      
        assertEquals(
                "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
                metadata.get(Metadata.CONTENT_TYPE));
        assertEquals("Sample Word Document", metadata.get(TikaCoreProperties.TITLE));
        assertEquals("Keith Bennett", metadata.get(TikaCoreProperties.CREATOR));
        assertEquals("Keith Bennett", metadata.get(Metadata.AUTHOR));
        assertTrue(handler.toString().contains("Sample Word Document"));
    } finally {
        input.close();
    }
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:27,代码来源:OOXMLParserTest.java


示例7: testWordCustomProperties

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
public void testWordCustomProperties() throws Exception {
   InputStream input = OOXMLParserTest.class.getResourceAsStream(
         "/test-documents/testWORD_custom_props.docx");
   Metadata metadata = new Metadata();

   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      new OOXMLParser().parse(input, handler, metadata, context);
   } finally {
      input.close();
   }

   assertEquals(
         "application/vnd.openxmlformats-officedocument.wordprocessingml.document", 
         metadata.get(Metadata.CONTENT_TYPE));
   assertEquals("EJ04325S",             metadata.get(TikaCoreProperties.CREATOR));
   assertEquals("Etienne Jouvin",       metadata.get(TikaCoreProperties.MODIFIER));
   assertEquals("Etienne Jouvin",       metadata.get(Metadata.LAST_AUTHOR));
   assertEquals("2011-07-29T16:52:00Z", metadata.get(TikaCoreProperties.CREATED));
   assertEquals("2011-07-29T16:52:00Z", metadata.get(Metadata.CREATION_DATE));
   assertEquals("2012-01-03T22:14:00Z", metadata.get(TikaCoreProperties.MODIFIED));
   assertEquals("2012-01-03T22:14:00Z", metadata.get(Metadata.DATE));
   assertEquals("Microsoft Office Word",metadata.get(Metadata.APPLICATION_NAME));
   assertEquals("Microsoft Office Word",metadata.get(OfficeOpenXMLExtended.APPLICATION));
   assertEquals("1",                    metadata.get(Office.PAGE_COUNT));
   assertEquals("2",                    metadata.get(Office.WORD_COUNT));
   assertEquals("My Title",             metadata.get(TikaCoreProperties.TITLE));
   assertEquals("My Keyword",           metadata.get(TikaCoreProperties.KEYWORDS));
   assertEquals("Normal.dotm",          metadata.get(Metadata.TEMPLATE));
   assertEquals("Normal.dotm",          metadata.get(OfficeOpenXMLExtended.TEMPLATE));
   // TODO: Remove subject in Tika 2.0
   assertEquals("My subject",           metadata.get(Metadata.SUBJECT));
   assertEquals("My subject",           metadata.get(OfficeOpenXMLCore.SUBJECT));
   assertEquals("EDF-DIT",              metadata.get(TikaCoreProperties.PUBLISHER));
   assertEquals("true",                 metadata.get("custom:myCustomBoolean"));
   assertEquals("3",                    metadata.get("custom:myCustomNumber"));
   assertEquals("MyStringValue",        metadata.get("custom:MyCustomString"));
   assertEquals("2010-12-30T23:00:00Z", metadata.get("custom:MyCustomDate"));
   assertEquals("2010-12-29T22:00:00Z", metadata.get("custom:myCustomSecondDate"));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:44,代码来源:OOXMLParserTest.java


示例8: testCustomMetadata

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
public void testCustomMetadata() throws Exception {
    Parser parser = new AutoDetectParser(); // Should auto-detect!
    Metadata metadata = new Metadata();

    InputStream stream = PDFParserTest.class.getResourceAsStream(
            "/test-documents/testPDF-custommetadata.pdf");

    String content = getText(stream, parser, metadata);

    assertEquals("application/pdf", metadata.get(Metadata.CONTENT_TYPE));
    assertEquals("Document author", metadata.get(TikaCoreProperties.CREATOR));
    assertEquals("Document author", metadata.get(Metadata.AUTHOR));
    assertEquals("Document title", metadata.get(TikaCoreProperties.TITLE));
    
    assertEquals("Custom Value", metadata.get("Custom Property"));
    
    assertEquals("Array Entry 1", metadata.get("Custom Array"));
    assertEquals(2, metadata.getValues("Custom Array").length);
    assertEquals("Array Entry 1", metadata.getValues("Custom Array")[0]);
    assertEquals("Array Entry 2", metadata.getValues("Custom Array")[1]);
    
    assertTrue(content.contains("Hello World!"));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:25,代码来源:PDFParserTest.java


示例9: testCustomProperties

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
/**
 * Ensures that custom OLE2 (HPSF) properties are extracted
 */
@Test
public void testCustomProperties() throws Exception {
   InputStream input = ExcelParserTest.class.getResourceAsStream(
         "/test-documents/testEXCEL_custom_props.xls");
   Metadata metadata = new Metadata();
   
   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      new OfficeParser().parse(input, handler, metadata, context);
   } finally {
      input.close();
   }
   
   assertEquals("application/vnd.ms-excel", metadata.get(Metadata.CONTENT_TYPE));
   assertEquals("",                     metadata.get(TikaCoreProperties.CREATOR));
   assertEquals("",                     metadata.get(TikaCoreProperties.MODIFIER));
   assertEquals("2011-08-22T13:45:54Z", metadata.get(TikaCoreProperties.MODIFIED));
   assertEquals("2006-09-12T15:06:44Z", metadata.get(TikaCoreProperties.CREATED));
   assertEquals("Microsoft Excel",      metadata.get(OfficeOpenXMLExtended.APPLICATION));
   assertEquals("true",                 metadata.get("custom:myCustomBoolean"));
   assertEquals("3",                    metadata.get("custom:myCustomNumber"));
   assertEquals("MyStringValue",        metadata.get("custom:MyCustomString"));
   assertEquals("2010-12-30T22:00:00Z", metadata.get("custom:MyCustomDate"));
   assertEquals("2010-12-29T22:00:00Z", metadata.get("custom:myCustomSecondDate"));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:31,代码来源:ExcelParserTest.java


示例10: FileTikaMediaImpl

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
FileTikaMediaImpl(File file, Metadata md)
        throws IOException {
    this.file = file;
    this.id = file.getCanonicalPath();
    this.lastModified = new Date(file.lastModified());

    artist = md.get(XMPDM.ARTIST);
    album = md.get(XMPDM.ALBUM);
    genre = md.get(XMPDM.GENRE);
    try {
        year = Integer.parseInt(md.get(XMPDM.RELEASE_DATE));
    } catch (NumberFormatException nfEx) {
        year = -1;
    }
    requestedBy = md.get(XMPDM.LOG_COMMENT);
    title = md.get(TikaCoreProperties.TITLE);
}
 
开发者ID:KolonelKustard,项目名称:discodj,代码行数:18,代码来源:FileTikaMediaImpl.java


示例11: setDocumentFeatures

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
private void setDocumentFeatures(Metadata metadata, Document doc) {
  FeatureMap fmap = doc.getFeatures();
  setTikaFeature(metadata, TikaCoreProperties.TITLE, fmap);
  setTikaFeature(metadata, Office.AUTHOR, fmap);
  setTikaFeature(metadata, TikaCoreProperties.COMMENTS, fmap);
  setTikaFeature(metadata, TikaCoreProperties.CREATOR, fmap);
  if (fmap.get("AUTHORS") == null && fmap.get("AUTHOR") != null)
    fmap.put("AUTHORS", fmap.get(Office.AUTHOR));
  fmap.put("MimeType", metadata.get(Metadata.CONTENT_TYPE));
}
 
开发者ID:GateNLP,项目名称:gate-core,代码行数:11,代码来源:TikaFormat.java


示例12: resolveMetadataKey

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
private Property resolveMetadataKey(String localName) {
    if ("authors".equals(localName)) {
        return TikaCoreProperties.CREATOR;
    }
    if ("title".equals(localName)) {
        return TikaCoreProperties.TITLE;
    }
    if ("comment".equals(localName)) {
        return TikaCoreProperties.COMMENTS;
    }
    return Property.internalText(localName);
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:13,代码来源:NumbersContentHandler.java


示例13: parse

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Override
public void parse(
        InputStream stream, ContentHandler handler,
        Metadata metadata, ParseContext context)
        throws IOException, SAXException, TikaException {
    super.parse(stream, handler, metadata, context);
    // Copy subject to description for OO2
    String odfSubject = metadata.get(OfficeOpenXMLCore.SUBJECT);
    if (odfSubject != null && !odfSubject.equals("") && 
            (metadata.get(TikaCoreProperties.DESCRIPTION) == null || metadata.get(TikaCoreProperties.DESCRIPTION).equals(""))) {
        metadata.set(TikaCoreProperties.DESCRIPTION, odfSubject);
    }
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:14,代码来源:OpenDocumentMetaParser.java


示例14: endElement

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Override
public void endElement(
        String uri, String local, String name) throws SAXException {
    if (bodyLevel > 0 && discardLevel == 0) {
        String safe = mapper.mapSafeElement(name);
        if (safe != null) {
            xhtml.endElement(safe);
        } else if (XHTMLContentHandler.ENDLINE.contains(
                name.toLowerCase(Locale.ENGLISH))) {
            // TIKA-343: Replace closing block tags (and <br/>) with a
            // newline unless the HtmlMapper above has already mapped
            // them to something else
            xhtml.newline();
        }
    }

    if (titleLevel > 0) {
        titleLevel--;
        if (titleLevel == 0) {
            metadata.set(TikaCoreProperties.TITLE, title.toString().trim());
        }
    }
    if (bodyLevel > 0) {
        bodyLevel--;
    }
    if (discardLevel > 0) {
        discardLevel--;
    }
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:30,代码来源:HtmlHandler.java


示例15: parse

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
public void parse(InputStream file) throws IOException, TikaException {
    ByteArrayOutputStream xmpraw = new ByteArrayOutputStream();
    if (!scanner.parse(file, xmpraw)) {
        return;
    }

    Reader decoded = new InputStreamReader(
            new ByteArrayInputStream(xmpraw.toByteArray()),
            DEFAULT_XMP_CHARSET);
    try {
        XMPMetadata xmp = XMPMetadata.load(new InputSource(decoded));
        XMPSchemaDublinCore dc = xmp.getDublinCoreSchema();
        if (dc != null) {
            if (dc.getTitle() != null) {
                metadata.set(TikaCoreProperties.TITLE, dc.getTitle());
            }
            if (dc.getDescription() != null) {
                metadata.set(TikaCoreProperties.DESCRIPTION, dc.getDescription());
            }
            if (dc.getCreators() != null && dc.getCreators().size() > 0) {
                metadata.set(TikaCoreProperties.CREATOR, joinCreators(dc.getCreators()));
            }
            if (dc.getSubjects() != null && dc.getSubjects().size() > 0) {
                Iterator<String> keywords = dc.getSubjects().iterator();
                while (keywords.hasNext()) {
                    metadata.add(TikaCoreProperties.KEYWORDS, keywords.next());
                }
                // TODO should we set KEYWORDS too?
                // All tested photo managers set the same in Iptc.Application2.Keywords and Xmp.dc.subject
            }
        }
    } catch (IOException e) {
        // Could not parse embedded XMP metadata. That's not a serious
        // problem, so we'll just ignore the issue for now.
        // TODO: Make error handling like this configurable.
    }
}
 
开发者ID:kolbasa,项目名称:OCRaptor,代码行数:38,代码来源:JempboxExtractor.java


示例16: testExcel

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
public void testExcel() throws Exception {
    Metadata metadata = new Metadata(); 
    ContentHandler handler = new BodyContentHandler();
    ParseContext context = new ParseContext();
    context.set(Locale.class, Locale.US);

    InputStream input = getTestDocument("testEXCEL.xlsx");
    try {
        parser.parse(input, handler, metadata, context);

        assertEquals(
                "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
                metadata.get(Metadata.CONTENT_TYPE));
        assertEquals("Simple Excel document", metadata.get(TikaCoreProperties.TITLE));
        assertEquals("Keith Bennett", metadata.get(TikaCoreProperties.CREATOR));
        assertEquals("Keith Bennett", metadata.get(Metadata.AUTHOR));
        String content = handler.toString();
        assertTrue(content.contains("Sample Excel Worksheet"));
        assertTrue(content.contains("Numbers and their Squares"));
        assertTrue(content.contains("9"));
        assertFalse(content.contains("9.0"));
        assertTrue(content.contains("196"));
        assertFalse(content.contains("196.0"));
        assertEquals("false", metadata.get(TikaMetadataKeys.PROTECTED));
    } finally {
        input.close();
    }
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:30,代码来源:OOXMLParserTest.java


示例17: testExcelCustomProperties

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
/**
 * Ensures that custom OOXML properties are extracted
 */
@Test
public void testExcelCustomProperties() throws Exception {
   InputStream input = OOXMLParserTest.class.getResourceAsStream(
         "/test-documents/testEXCEL_custom_props.xlsx");
   Metadata metadata = new Metadata();
   
   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      new OOXMLParser().parse(input, handler, metadata, context);
   } finally {
      input.close();
   }
   
   assertEquals(
         "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", 
         metadata.get(Metadata.CONTENT_TYPE));
   assertEquals(null,                   metadata.get(TikaCoreProperties.CREATOR));
   assertEquals(null,                   metadata.get(TikaCoreProperties.MODIFIER));
   assertEquals("2006-09-12T15:06:44Z", metadata.get(TikaCoreProperties.CREATED));
   assertEquals("2006-09-12T15:06:44Z", metadata.get(Metadata.CREATION_DATE));
   assertEquals("2011-08-22T14:24:38Z", metadata.get(Metadata.LAST_MODIFIED));
   assertEquals("2011-08-22T14:24:38Z", metadata.get(TikaCoreProperties.MODIFIED));
   assertEquals("2011-08-22T14:24:38Z", metadata.get(Metadata.DATE));
   assertEquals("Microsoft Excel",      metadata.get(Metadata.APPLICATION_NAME));
   assertEquals("Microsoft Excel",      metadata.get(OfficeOpenXMLExtended.APPLICATION));
   assertEquals("true",                 metadata.get("custom:myCustomBoolean"));
   assertEquals("3",                    metadata.get("custom:myCustomNumber"));
   assertEquals("MyStringValue",        metadata.get("custom:MyCustomString"));
   assertEquals("2010-12-30T22:00:00Z", metadata.get("custom:MyCustomDate"));
   assertEquals("2010-12-29T22:00:00Z", metadata.get("custom:myCustomSecondDate"));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:37,代码来源:OOXMLParserTest.java


示例18: testPowerPointCustomProperties

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
public void testPowerPointCustomProperties() throws Exception {
   InputStream input = OOXMLParserTest.class.getResourceAsStream(
         "/test-documents/testPPT_custom_props.pptx");
   Metadata metadata = new Metadata();

   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      new OOXMLParser().parse(input, handler, metadata, context);
   } finally {
      input.close();
   }

   assertEquals(
         "application/vnd.openxmlformats-officedocument.presentationml.presentation", 
         metadata.get(Metadata.CONTENT_TYPE));
   assertEquals("JOUVIN ETIENNE",       metadata.get(TikaCoreProperties.CREATOR));
   assertEquals("EJ04325S",             metadata.get(TikaCoreProperties.MODIFIER));
   assertEquals("EJ04325S",             metadata.get(Metadata.LAST_AUTHOR));
   assertEquals("2011-08-22T13:30:53Z", metadata.get(TikaCoreProperties.CREATED));
   assertEquals("2011-08-22T13:30:53Z", metadata.get(Metadata.CREATION_DATE));
   assertEquals("2011-08-22T13:32:49Z", metadata.get(TikaCoreProperties.MODIFIED));
   assertEquals("2011-08-22T13:32:49Z", metadata.get(Metadata.DATE));
   assertEquals("1",                    metadata.get(Office.SLIDE_COUNT));
   assertEquals("3",                    metadata.get(Office.WORD_COUNT));
   assertEquals("Test extraction properties pptx", metadata.get(TikaCoreProperties.TITLE));
   assertEquals("true",                 metadata.get("custom:myCustomBoolean"));
   assertEquals("3",                    metadata.get("custom:myCustomNumber"));
   assertEquals("MyStringValue",        metadata.get("custom:MyCustomString"));
   assertEquals("2010-12-30T22:00:00Z", metadata.get("custom:MyCustomDate"));
   assertEquals("2010-12-29T22:00:00Z", metadata.get("custom:myCustomSecondDate"));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:35,代码来源:OOXMLParserTest.java


示例19: testParseNumbers

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
public void testParseNumbers() throws Exception {
    InputStream input = IWorkParserTest.class.getResourceAsStream("/test-documents/testNumbers.numbers");
    Metadata metadata = new Metadata();
    ContentHandler handler = new BodyContentHandler();

    iWorkParser.parse(input, handler, metadata, parseContext);

    // Make sure enough keys came through
    // (Exact numbers will vary based on composites)
    assertTrue("Insufficient metadata found " + metadata.size(), metadata.size() >= 8);
    List<String> metadataKeys = Arrays.asList(metadata.names());
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(Metadata.CONTENT_TYPE));
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(Metadata.PAGE_COUNT.getName()));
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(TikaCoreProperties.CREATOR.getName()));
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(TikaCoreProperties.COMMENTS.getName()));
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(Metadata.TITLE));
    assertTrue("Metadata not found in " + metadataKeys, metadataKeys.contains(TikaCoreProperties.TITLE.getName()));
    
    // Check the metadata values
    assertEquals("2", metadata.get(Metadata.PAGE_COUNT));
    assertEquals("Tika User", metadata.get(TikaCoreProperties.CREATOR));
    assertEquals("Account checking", metadata.get(TikaCoreProperties.TITLE));
    assertEquals("a comment", metadata.get(TikaCoreProperties.COMMENTS));

    String content = handler.toString();
    assertTrue(content.contains("Category"));
    assertTrue(content.contains("Home"));
    assertTrue(content.contains("-226"));
    assertTrue(content.contains("-137.5"));
    assertTrue(content.contains("Checking Account: 300545668"));
    assertTrue(content.contains("4650"));
    assertTrue(content.contains("Credit Card"));
    assertTrue(content.contains("Groceries"));
    assertTrue(content.contains("-210"));
    assertTrue(content.contains("Food"));
    assertTrue(content.contains("Try adding your own account transactions to this table."));
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:39,代码来源:IWorkParserTest.java


示例20: testPdfParsing

import org.apache.tika.metadata.TikaCoreProperties; //导入依赖的package包/类
@Test
    public void testPdfParsing() throws Exception {
        Parser parser = new AutoDetectParser(); // Should auto-detect!
        Metadata metadata = new Metadata();

        InputStream stream = PDFParserTest.class.getResourceAsStream(
                "/test-documents/testPDF.pdf");

        String content = getText(stream, parser, metadata);

        assertEquals("application/pdf", metadata.get(Metadata.CONTENT_TYPE));
        assertEquals("Bertrand Delacr\u00e9taz", metadata.get(TikaCoreProperties.CREATOR));
        assertEquals("Bertrand Delacr\u00e9taz", metadata.get(Metadata.AUTHOR));
        assertEquals("Firefox", metadata.get(TikaCoreProperties.CREATOR_TOOL));
        assertEquals("Apache Tika - Apache Tika", metadata.get(TikaCoreProperties.TITLE));
        
        // Can't reliably test dates yet - see TIKA-451 
//        assertEquals("Sat Sep 15 10:02:31 BST 2007", metadata.get(Metadata.CREATION_DATE));
//        assertEquals("Sat Sep 15 10:02:31 BST 2007", metadata.get(Metadata.LAST_MODIFIED));

        assertTrue(content.contains("Apache Tika"));
        assertTrue(content.contains("Tika - Content Analysis Toolkit"));
        assertTrue(content.contains("incubator"));
        assertTrue(content.contains("Apache Software Foundation"));
        // testing how the end of one paragraph is separated from start of the next one
        assertTrue("should have word boundary after headline", 
                !content.contains("ToolkitApache"));
        assertTrue("should have word boundary between paragraphs", 
                !content.contains("libraries.Apache"));
    }
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:31,代码来源:PDFParserTest.java



注:本文中的org.apache.tika.metadata.TikaCoreProperties类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java LongEditor类代码示例发布时间:2022-05-22
下一篇:
Java ConnectionConfigFactory类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap