Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

How to remove elements from XML using Python

I got stuck with XML and Python. The task is simple but I couldn't resolve it so far and spent on that long time. I came here for an advice how to solve it with couple of lines.

Thanks for any help with traversing the tree. I always ended up with too many or too few elements. Elements can be nested without limit. Given example is just an example. I will accept any solution, not picky about dom, minidom, sax, whatever..

I have an XML file similar to this one:

<root>
    <elm>
        <elm>Common content</elm>

        <elm xmlns="http://example.org/ns">
            <elm lang="en">Content EN</elm>
            <elm lang="cs">?lu?ou?ky koní?ek</elm>
        </elm>

        <elm xml:id="abc123">Common content</elm>

        <elm lang="en">Content EN</elm>
        <elm lang="cs">Content CS</elm>

        <elm lang="en">
            <elm>Content EN</elm>
            <elm>Content EN</elm>
        </elm>

        <elm lang="cs">
            <elm>Content CS</elm>
            <elm>Content CS</elm>
        </elm>
    </elm>
</root>

What I need - parse the XML and write a new file. The new file should contain all the elements for given language and elements without lang attribute.

For "cs" language the output file should containt this:

<root>
    <elm>
        <elm>Common content</elm>

        <elm xmlns="http://example.org/ns">
            <elm lang="cs">?lu?ou?ky koní?ek</elm>
        </elm>

        <elm xml:id="abc123">Common content</elm>

        <elm lang="cs">Content CS</elm>

        <elm lang="cs">
            <elm>Content CS</elm>
            <elm>Content CS</elm>
        </elm>
    </elm>
</root>

If you can make it to omit the lang attribute in the new file, even better. But it's not that important.

UPDATE1: Added unicode characters and namespace attribute.

UPDATE2: Using Python 2.5, standard libraries preferred.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using lxml:

import lxml.etree as le

with open('doc.xml','r') as f:
    doc=le.parse(f)
    for elem in doc.xpath('//*[attribute::lang]'):
        if elem.attrib['lang']=='en':
            elem.attrib.pop('lang')
        else:
            parent=elem.getparent()
            parent.remove(elem)
    print(le.tostring(doc))

yields

<root>
    <elm>Common content</elm>

    <elm>
        <elm>Content EN</elm>
        </elm>

    <elm>Common content</elm>

    <elm>Content EN</elm>
    <elm>
        <elm>Content EN</elm>
        <elm>Content EN</elm>
    </elm>

    </root>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...