Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
660 views
in Technique[技术] by (71.8m points)

parsing - Java RTF Parser

Does anyone know of a robust RTF parser I can use in Java? I need to extract plain text, including international text. It would also be nice to extract embedded images and files. It could also be a C++ or other library that I can easily call, or if there is good source code, I can convert to Java.

The following libraries do not cover enough of the RTF, or fail to parse some valid RTFs

  1. Java Swing's RTFEditorKit, quite basic and brittle Apache Tikka, nutch, and lots of other tools use this.
  2. an RTF library from iText (com.lowagie.etc...), not too comprehensive
  3. etranslate rtf library (this is the most complete of the java ones) Not sure if there is an updated version, but the version I got fails on some of my rtf collection (the RTFs are valid, at least they open in MsWord and OpenOffice OK).

There's a C# library that's reasonably complete, but alas ...it's C# and not Java. http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter

I also looked into OpenOffice, it is too slow for what I need, though it's probably very comprehensive.

(I did do web searches and stack overflow searches before posting this question, so if you are referring me to an ancient "already asked" post, it probably doesn't have an answer there. But feel free to point it out, in case I missed it!)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You may find RTF Parser Kit useful. It provides a stream-based parser which delivers events to you as the document is parsed. There is a simple example text extractor provided which demonstrates how the API can be used.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...