Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

Python - Find specific XML element using lxml

Background: I understand this question has been asked, but I'm really struggling with finding the right Xpath formula. I'm working with the iTunes XML file. So Apple is really annoying how they formatted this file.... Instead of making a tag, and then the text being the ID, they only using , , <some_value> tags. This is making it REALLY confusing trying to find the right element.

I've been trying to following this question: Python ElementTree: find element by its child's text using XPath but it's just not working for me. I've been reading this (https://lxml.de/tutorial.html) document, but i'm not finding the answer I need. I'm fairly certain the Xpath is the way to go, but I'll take any better suggestions.

I did try using some itunes specific libraries. It seems like they are all out of date, or just not working how I need. I was initially using Elementtree, but the getnext() feature from lxml is a savior when it Apple formats the file this way.

Example XML file:

<plist version="1.0">
<dict>
    <key>Library Persistent ID</key><string>6948B4402F0EEFFF</string>
    <key>Tracks</key>
    <dict>
        <key>18051</key>
        <dict>
            <key>Track ID</key><integer>18051</integer>
            <key>Size</key><integer>7930116</integer>
            <key>Total Time</key><integer>196336</integer>
            <key>BPM</key><integer>86</integer>
            <key>Date Modified</key><date>2018-10-23T12:41:05Z</date>
            <key>Date Added</key><date>2017-07-25T02:49:11Z</date>
        </dict>
        <key>18053</key>
        <dict>
            <key>Track ID</key><integer>18053</integer>
            <key>Size</key><integer>9780560</integer>
            <key>Total Time</key><integer>243513</integer>
            <key>Year</key><integer>2010</integer>
            <key>BPM</key><integer>74</integer>
            <key>Date Modified</key><date>2018-10-23T12:41:09Z</date>
            <key>Date Added</key><date>2017-07-25T02:49:11Z</date>
        </dict>
        <key>18055</key>
        <dict>
            <key>Track ID</key><integer>18055</integer>
            <key>Size</key><integer>12995663</integer>
            <key>Total Time</key><integer>323604</integer>
            <key>Year</key><integer>2005</integer>
            <key>BPM</key><integer>76</integer>
            <key>Date Modified</key><date>2018-10-23T12:41:14Z</date>
        <key>Date Added</key><date>2017-07-25T02:49:11Z</date>
        </dict>

The Approach: So let's say I need to find the element that has the ID '18053', Instead of iterating over everything recursively (which I've figured out how to do), it would be much more efficient if I could check for the ID #. Then get the element of the that's after

I've tried the following:

root = etree.parse(xml_file_path)
key_found = root.xpath("//key[text()='18053']")
element_wanted = key_found.getnext()

But get an error because key_found is a list, not an element.

Using find, should return an element, but using the following:

key_found = root.find("//key[text()='18053']")

I get an error "bad predicate"

Any help is appreciated. I've been working on this one for a few days now. Blame Apple! Thanks!

question from:https://stackoverflow.com/questions/66056798/python-find-specific-xml-element-using-lxml

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The xpath method should return a list, so

element_wanted = key_found.getnext()

would be

element_wanted = key_found[0].getnext()

Provided you also test if the list contains elements


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...