Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
605 views
in Technique[技术] by (71.8m points)

xml parsing - How is a String marked as bold in a Libre Office flat XML (fods) file?

Looking at the raw XML from a .fods file:

  <table:table-column table:style-name="co1" table:default-cell-style-name="ce17"/>
  <table:table-row table:style-name="ro1">
    <table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
      <text:p>John Smith</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
      <text:p>(123) 456-7890</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p>123?Main Street</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell office:value-type="string" calcext:value-type="string">
      <text:p>Anywhere,?ZZ?12345-6789</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro1">
    <table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
      <text:p>Jane Doe</text:p>
    </table:table-cell>
  </table:table-row>
  <table:table-row table:style-name="ro2">
    <table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
      <text:p>(234) 567-8901</text:p>

When opened in Libre Office the names are in bold. Where would that be reflected in the above XML? I'm only seeing a value-type="string" with no markup for bold, underline, etc.

Everything is in a single column, so not quite sure what the default-cell-style-name="ce17" attribute indicates.

While the data originated as a .doc file, I'm using Libre Office on the file.

I'm looking to extract the names from the XML, which are only, really, distinguished from phone or address in that they're in bold. I suppose there's no numeric numbers, either, but I'd like to select the bold data from the spreadsheet.

The formatting information seems somewhat vague:

Formatting

The style and formatting controls are numerous, providing a number of controls over the display of information.

Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.

Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.

There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, generic font family names (roman – serif, swiss – sans-serif, modern – monospace, decorative, script or system), and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Values and formats are placed in different sections of the XML file.

So usually, you have a 'style' section where all the formats are defined with a name (style:name).

In the table section, you have the table defined, the values placed in it and which style has (identified by his 'table:style-name'). You can define a style for each cell, for an entire row, entire column or even the entire table.

So in your case, you can identify the bold text looking to the style name is using. That's not always easy, because you can specify a default style for an entire column/row (default-cell-style-name="ce17") which it would takes place in case the style is not defined.

I developed a library for parse ODS Files in Java, so in case you need inspiration you can check it out in Github: https://github.com/miachm/SODS


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...