Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
325 views
in Technique[技术] by (71.8m points)

java - Use jsoup to parse XML - prevent jsoup from "cleaning" <link> tags

In most case, I have no problem with using jsoup to parse XML. However, if there are <link> tags in the XML document, jsoup will change <link>some text here</link> to <link />some text here. This makes it impossible to extract text inside the <link> tag using CSS selector.

So how to prevent jsoup from "cleaning" <link> tags?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In jsoup 1.6.2 I have added an XML parser mode, which parses the input as-is, without applying the HTML5 parse rules (contents of element, document structure, etc). This mode will keep text in a <link> tag, and allow multiples of it, etc.

Here's an example:

String xml = "<link>One</link><link>Two</link>";
Document xmlDoc = Jsoup.parse(xml, "", Parser.xmlParser());

Elements links = xmlDoc.select("link");
System.out.println("Link text 1: " + links.get(0).text());
System.out.println("Link text 2: " + links.get(1).text());

Returns:

Link text 1: One
Link text 2: Two

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...