Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
484 views
in Technique[技术] by (71.8m points)

javascript - crawl dynamic web page using htmlunit

I am crawling data using HtmlUnit from a dynamic webpage, which uses infinite scrolling to fetch data dynamically, just like facebook's newsfeed. I used the following sentence to simulate the scrolling down event:

webclient.setJavaScriptEnabled(true);
webclient.setAjaxController(new NicelyResynchronizingAjaxController());
ScriptResult sr=myHtmlPage.executeJavaScript("window.scrollBy(0,600)");
webclient.waitForBackgroundJavaScript(10000);
myHtmlPage=(HtmlPage)sr.getNewPage();

But it seems myHtmlPage stays the same with the previous one, i.e., new data is not appended in myHtmlPage, as a result I can only crawl the first few data on the web page. Thanks for your help!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I was searching the same thing. I was only able to find that it is not scroll event (90% sure). There is link on JS wich is responsilbe for loading the page and could maybe help you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...