Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
717 views
in Technique[技术] by (71.8m points)

xml - How to use XMLStarlet / Xpath to select text inside <div> but exclude some inner <span>

I have such html files. Basically it has div containing texts with one inner span and the rest text part has pretty arbitrary format.

<html>
<div>
<span class="c1">Text1</span><br/>
Text4<br/>
Text5
</div>
<div>
<span class="c1">TextA</span><a href="...">TextD</a>
</div>
</html>

it is trivial to select/print only specific text inside span with xml sel -t -m "/html/div" -v "span[@class='c1']" -n

However I don't know how to select/print the rest text within but outside the span regardless of any other tags like <br/>. The function text() does not work as I expect.

xml sel -t -m "/html/div" -v "concat(span[@class='c1'],'|',text(),'$')" -n will cut text behind <br/> tags.

how can I get something like

Text1|
Text4
Text5$
TextA|TextD$

question from:https://stackoverflow.com/questions/65864609/how-to-use-xmlstarlet-xpath-to-select-text-inside-div-but-exclude-some-inner

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I've tested some xpath's so the best one which I found is

//div/descendant-or-self::*/text()[normalize-space()]

xpath result

It indicates the context node and all of its descendants, get text values that aren't empty.

about XPath axis


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...