Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
522 views
in Technique[技术] by (71.8m points)

c# - How to extract dynamic ajax content from a web page

My requirement is to extract the required content from a web page. The page has a section which is being populated using ajax. When i view in page source it is not showing the content loaded using ajax. The section content will change based on check box selected. If we select 'India' check box then the section will display all the details of India. The page source will show only default content not the content displayed using ajax. I checked the page source after selecting the check box, still it shows only default value. How to get that section content,

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In C# you can use HTMLAgilityPack to craw data, but if you use webBrowser.DocumentText, you can't load ajax content from webpage to get xpath. So after webBrowser control loaded webpage completely. In Document_Complete method you add some codes below:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
this.webBrowser1.Document;
IHTMLDocument2 currentDoc =(IHTMLDocument2)this.webBrowser1.Document.DomDocument;

doc.LoadHtml(currentDoc.activeElement.innerHTML);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...