java - why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

Question

Welcome To Ask or Share your Answers For Others

java - why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

I have an xml with utf8 encoding. And this file contains BOM a beginning of the file. So during parsing I am facing with org.xml.sax.SAXParseException: Content is not allowed in prolog. I can not remove those 3 bytes from the files. I can not load file into memory and remove them here (files are big). So for performance reasons I'm using SAX parser and want just to skip those 3 bytes if they are present before "" tag. Should I inherit InputStreamReader for this?

I'm new in java - show me the right way please.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T03:08:59+0000

This has come up before, and I found the answer on Stack Overflow when it happened to me. The linked answer uses a PushbackInputStream to test for the BOM.

Categories

java - why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

java - why org.apache.xerces.parsers.SAXParser does not skip BOM in utf8 encoded xml?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags