Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
484 views
in Technique[技术] by (71.8m points)

c# - Grabbing meta-tags and comments using HTML Agility Pack

I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems that for such a powerful tool there is little noise about it on the Internet.

I am writing a simple method that will retrieve any given tag based on name:

public string[] GetTagsByName(string TagName, string Source) {
    ...
}

This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code:

...
// TODO: Clear Comments (can this be done or should I use RegEx?)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Source);
ArrayList tags = new ArrayList();
string xpath = "//" + TagName;
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes(xpath) {
    tags.Add(node.Text);
}
return (string[])tags.ToArray(typeof(String));

I would like to be able to first strip all comments from the HTML, then return the correct tag based on its name. If possible I'd also like to return certain meta-tags based on attribute, such as robot. I'm not that great with xpath, so any help with that would be good.

Any help would be much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

HtmlAgilityPack's HtmlDocument implements IXpathNavigable, thus it uses the standard .NET XPath engine. Any XPath 1.0 documentation will be applicable, especially if it talks about System.Xml.XPath.

"//comment()" finds all comments
"//meta" finds all "meta" elements

HtmlDocument was designed to look very much like XmlDocument, so examples and tutorials about it will be somewhat applicable.

Some MSDN links:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...