xml - stripping inline tags with python's lxml

Question

Welcome To Ask or Share your Answers For Others

xml - stripping inline tags with python's lxml

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

xml - stripping inline tags with python's lxml

I have to deal with two types of inline tags in xml documents. The first type of tags enclose text that I want to keep in-between. I can deal with this with lxml's

etree.tostring(element, method="text", encoding='utf-8')

The second type of tags include text that I don't want to keep. How can I get rid of these tags and their text? I would prefer not to use regular expressions, if possible.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:56:04+0000

I think that strip_tags and strip_elements are what you want in each case. For example, this script:

from lxml import etree

text = "<x>hello, <z>keep me</z> and <y>ignore me</y>, and here's some <y>more</y> text</x>"

tree = etree.fromstring(text)

print etree.tostring(tree, pretty_print=True)

# Remove the <z> tags, but keep their contents:
etree.strip_tags(tree, 'z')

print '-' * 72
print etree.tostring(tree, pretty_print=True)

# Remove all the <y> tags including their contents:
etree.strip_elements(tree, 'y', with_tail=False)

print '-' * 72
print etree.tostring(tree, pretty_print=True)

... produces the following output:

<x>hello, <z>keep me</z> and <y>ignore me</y>, and
here's some <y>more</y> text</x>

------------------------------------------------------------------------
<x>hello, keep me and <y>ignore me</y>, and
here's some <y>more</y> text</x>

------------------------------------------------------------------------
<x>hello, keep me and , and
here's some  text</x>

Categories

xml - stripping inline tags with python's lxml

xml - stripping inline tags with python's lxml

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags