Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
498 views
in Technique[技术] by (71.8m points)

php - Regular expression to match ">", "<", "&" chars that appear inside XML nodes

I'm trying to write a regular expression using the PCRE library in PHP.

I need a regex to match only &, > and < chars that exist within string part of any XML node and not the tag declaration themselves.

Input XML:

<pnode>
  <cnode>This string contains > and < and & chars.</cnode>
</pnode>

The idea is to to a search and replace these chars and convert them to XML entities equivalents.

If I was to convert the entire XML to entities the XML would look like this:

Entire XML converted to entities

&lt;pnode&gt;
  &lt;cnode&gt;This string contains &gt; and &lt; and &amp; chars.&lt;/cnode&gt;
&lt;/pnode&gt;

I need it to look like this:

Correct XML

<pnode>
  <cnode>This string contains &gt; and &lt and &amp; chars.</cnode>
</pnode>

I have tried to write a regular expression to match these chars using look-ahaead but I don't know enough to get this to work. My attempt (currently only attempting to match > symbols):

/>(?=[^<]*<)/g

Just to make it clear the XML I'm trying to fix comes from a 3rd party and they seem unable to fix it their end hence my attempt to fix it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In the end I've opted to use the Tidy library in PHP. The code I used is shown below:

  // Specify configuration
  $config = array(
    'input-xml'  => true,
    'show-warnings' => false,
    'numeric-entities' => true,
    'output-xml' => true);

  $tidy = new tidy();
  $tidy->parseFile('feed.xml', $config, 'latin1');
  $tidy->cleanRepair()

This works perfectly correcting all the encoding errors and converting invalid characters to XML entities.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...