Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
562 views
in Technique[技术] by (71.8m points)

jsoup - Fetch contents(loaded through AJAX call) of a web page

I am a beginner to crawling. I have a requirement to fetch the posts and comments from a link. I want to automate this process. I considered using webcrawler and jsoup for this but was told that webcrawlers are mostly used for websites with greater depth.

Sample for a page: Jive community website

For this page, when I view the source of the page, I can see only the post and not the comments. Think this is because comments are fetched through an AJAX call to the server.

Hence, when I use jsoup, it doesn't fetch the comments.

So how can I automate the process of fetching posts and comments?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Jsoup is a html parser only. Unfortunately it's not possible to parse any javascript / ajax content, since jsoup can't execute those.

The solution: using a library which can handle Scripts.

Here are some examples i know:

If such a library doesn't support parsing or selectors, you can at least use them to get Html out of the scripts (which then can be parsed by jsoup).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...