Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
137 views
in Technique[技术] by (71.8m points)

java - Parsing an XML stream with no root element

I need to parse a continuous stream of well-formed XML elements, to which I am only given an already constructed java.io.Reader object. These elements are not enclosed in a root element, nor are they prepended with an XML header like <?xml version="1.0"?>", but are otherwise valid XML.

Using the Java org.xml.sax.XMLReader class does not work, because the XML Reader expects to parse well-formed XML, starting with an enclosing root element. So, it just reads the first element in the stream, which it perceives as the root, and fails in the next one, with the typical

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

For files that do not contain a root element, but where such element does exist or can be defined (and is called, say, MyRootElement), one can do something like the following:

        Strint path = <the full path to the file>;

        XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();

        StringBuilder buffer = new StringBuilder();

        buffer.append("<?xml version="1.0"?>
");
        buffer.append("<!DOCTYPE MyRootElement ");
        buffer.append("[<!ENTITY data SYSTEM "file:///");
        buffer.append(path);
        buffer.append("">]>
");
        buffer.append("<MyRootElement xmlns:...>
");
        buffer.append("&data;
");
        buffer.append("</MyRootElement>
");

        InputSource source = new InputSource(new StringReader(buffer.toString()));

        xmlReader.parse(source);

I have tested the above by saving part of the java.io.Reader output to a file and it works. However, this approach is not applicable in my case and such extra information (XML header, root element) cannot be inserted, since the java.io.Reader object passed to my code is already constructed.

Essentially, I am looking for "fragmented XML parsing". So, my question is, can it be done, using standard Java APIs (including the org.sax.xml.* and java.xml.* packages)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

SequenceInputStream comes to the rescue:

    SAXParserFactory saxFactory = SAXParserFactory.newInstance();
    SAXParser parser = saxFactory.newSAXParser();

    parser.parse(
        new SequenceInputStream(
            Collections.enumeration(Arrays.asList(
            new InputStream[] {
                new ByteArrayInputStream("<dummy>".getBytes()),
                new FileInputStream(file),//bogus xml
                new ByteArrayInputStream("</dummy>".getBytes()),
            }))
        ), 
        new DefaultHandler()
    );

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.9k users

...