I would like to extract text from an html document keeping the links inside it. for example:
From this HTML code
<div class="CssClass21">bla1 bla1 bla1 <a href="http://www.ibrii.com">go to ibrii</a> bla2 bla2 bla2 <img src="http://www.contoso.com/hello.jpg"> <span class="cssClass34">hello hello</span>
I would like to extract just this
bla1 bla1 bla1 <a href="http://www.ibrii.com">go to ibrii</a> bla2 bla2 bla2 hello hello
In another post on StackOverflow i have found the RegEx <[^>]*>
which allows to extract text by replacing every match with nothing. How can I exclude the anchor tags from the match? It seems that RegEx do not allow inverse matching.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…