C#获取执行完JS之后的HTML页面内容的代码【草稿，有待考证】

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› C++›C++教程

原作者: [db:作者] 来自: [db:来源] 收藏邀请

爬虫如何获取执行完js后的html源文件，比如页面我点击查询之后，自动生成一个表格承载数据，但是我右键查看源文件，是无法查看到这个JS生成的Table的。
用火狐Debug是可以的参考网址

http://www.hfepb.gov.cn/kqzt.aspx

可以看到生成的表格。但是查看源文件，无法查看到数字。

网上的【------解决方案--------
通过设置webBrowser的url，把获取到的源码给webBrowser.Document，等webBrowser.DocumentCompleted后，获取ebBrowser.Document应该就OK了。】

通过尝试按F12后，菜单【缓存】——【清除此域的...】，发现问题解决了，可以得到js执行后的完整html数据。下次执行前，还必须手动【清除】，不然还是得不到js后数据。于是，找到突破口，用代码清除缓存，问题迎刃而解。

这个问题困扰了两天，终于找到了解决办法：

/// <summary>
        /// 针对js页面，获取页面内容。火狐的“查看元素”也可以获取。
        /// </summary>
        private void PrintHelpPage()
        {
            // Create a WebBrowser instance. 
            WebBrowser webBrowserForPrinting = new WebBrowser();

            // Add an event handler that prints the document after it loads.
            webBrowserForPrinting.DocumentCompleted +=
                new WebBrowserDocumentCompletedEventHandler(PrintDocument);
            //删除缓存为关键一步，必须进行；不然得不到js执行后的数据
            string cachePath = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache);//获取缓存路径
            DirectoryInfo di = new DirectoryInfo(cachePath);
            foreach (FileInfo fi in di.GetFiles("*.*", SearchOption.AllDirectories))//遍历所有的文件夹 删除里面的文件
            {
                try
                {
                    fi.Delete();
                }
                catch { }
            }

            // Set the Url property to load the document.
            webBrowserForPrinting.Url = new Uri("http://218.23.98.205:8080/aqi/components/aqi/explainDay.jsp");
        }

        private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            //MessageBox.Show("000");
            //foreach (HtmlElement he in ((WebBrowser)sender).Document.GetElementById("sljaqi"))
            //{
            //    //if (he.GetAttribute("classname") == "co_yl")
            //    //{
            //    //    //然后网页信息格式，来分解出你要的信息。
            //    //}
            //    MessageBox.Show(he.OuterText);
            //    MessageBox.Show(he.Name);
            //}
            MessageBox.Show(((WebBrowser)sender).Document.GetElementById("sljaqi").InnerHtml);

            // Print the document now that it is fully loaded.
            //((WebBrowser)sender).Print();

            // Dispose the WebBrowser now that the task is complete. 
            ((WebBrowser)sender).Dispose();
        }