Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
363 views
in Technique[技术] by (71.8m points)

xml - How to handle < and > in CDATA with xslt 1.0?

I have a problem with < and > in a CDATA section that is presented using xslt 1.0. Valid HTML-tags works fine when using disable-output-escaping="yes" for the element. But sometimes there are some other texts included that for example looks like this . This is not visible in the browser. I guess the browser interprets it as a HTML-tag but since it is not valid it just ignores it.

Example of CDATA text element:

<Text>The fox jumps &lt;em&gt;over&lt;/&gt; the fence. The fence is &lt;XYZ&gt; meters high.</Text>

Current xsl code that formats this text (works as described above):

<div class="casebook-memo-key-text"><xsl:value-of disable-output-escaping="yes" select="p:Text"/>&#160;</div> 

I could replace the non valid ones with for example -XYZ- using translate() but how to not replace the correct HTML-tags? Any idea?

question from:https://stackoverflow.com/questions/66046396/how-to-handle-and-in-cdata-with-xslt-1-0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think disable-output-escaping is your best bet for this particular data with XSLT 1.0.

Even better would be to change how the source XML is being generated, if possible, to ensure that it is generating well-formed XML. But that can be difficult if you don't control how the HTML is constructed, or is just shifting the work to be done on input instead of output.

If someone forgets to close an element or does use valid HTML syntax that is not well-formed, such as <br> or <img> then the source XML would not be well-formed and could not parse. So, the "safe" way is to jam it into CDATA or escape the markup and put in as text(). But then that becomes your problem to figure out how to turn it back into markup on the way out.

There are tools, such as tidy that can be used to parse HTML and ensure that it is well-formed.

If you know that it is well-formed, then make it actual xHTML in the <Text> and then it will serialize properly without figuring out how to parse the string into XML or use tricks to serialize without escaping.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...