Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
91 views
in Technique[技术] by (71.8m points)

Retaining text formatting in XML

I have some data that I would like to store in an XML file. (It doesn't have to be XML, but XML is a nice, open format.)

The data consists of nodes and child nodes (no limit on depth), and every single node can have some text.

My data might look something like this:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">
      Here is some text for child1.
    </node>
    <node title="child2">
      Here is some text for child2.
    </node>
    <node title="child3">
      Here is some text for child3.
    </node>
    Here is some text for root.
  </node>
</nodes>

But the problem with this approach is that I'm ending up with a lot of whitespace that wasn't in the original text. For example, the text for my root node has 10 newlines and a bunch of tabs (or spaces) in order to format the child nodes nicely.

What's a good way to use XML to store data this way, but retaining the original text exactly, without adding any additional whitespace characters?

Note: I assume I could just have all the data without newlines or indents like this:

<?xml version="1.0" ?>
<nodes>
  <node title="root"><node title="child1">Here is some text for child1.
</node><node title="child2">Here is some text for child2.
</node><node title="child3">Here is some text for child3.
</node>Here is some text for root.
</node>
</nodes>

I guess that eliminates any new whitespace. But is that the best way? It's about as ugly as it could be. And some XML viewers might format the tags by adding whitespace.

question from:https://stackoverflow.com/questions/66045619/can-someone-please-help-me-explain-to-others-about-non-significant-whitespace-in

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Let's separately consider unmixed and mixed content:

Unmixed Content

When no text can be mixed between your elements, simply manage whitespace within elements as you wish, and allow XML serializers and editors to manage the whitespace between elements:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">Here is some text for child1.</node>
    <node title="child2">Here is some text for child2.</node>
    <node title="child3">Here is some text for child3.</node>
  </node>
</nodes>

This works fine for both data-oriented and document-oriented XML. (OOXML is an example of document-oriented XML that doesn't need mixed content.)

Mixed Content

When text can be mixed between your elements, decide how to manage whitespace depending upon the semantics of your data. For example, if your data is like HTML, multiple consecutive space mean nothing different than a single space, so allowing XML serializers and editors to manage the whitespace is fine:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">Here is some text for child1. </node>
    <node title="child2">Here is some text for child2. </node>
    <node title="child3">Here is some text for child3. </node>
    Here is some text for root.
  </node>
</nodes>

xml:space

If some portion of your XML associates importance to embedded whitespace, you can signify this by adding a special xml:space="preserve" attribute to the containing element:

2.10 White Space Handling

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose values are one or both of "default" and "preserve".

You should take care to use xml:space="preserve" conservatively, however. Placing it on the root element of a complex XML format such as OOXML is likely to make consumers of your data justifiably unhappy.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...