jsoup - Fetch contents(loaded through AJAX call) of a web page

Question

Welcome To Ask or Share your Answers For Others

jsoup - Fetch contents(loaded through AJAX call) of a web page

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

jsoup - Fetch contents(loaded through AJAX call) of a web page

I am a beginner to crawling. I have a requirement to fetch the posts and comments from a link. I want to automate this process. I considered using webcrawler and jsoup for this but was told that webcrawlers are mostly used for websites with greater depth.

Sample for a page: Jive community website

For this page, when I view the source of the page, I can see only the post and not the comments. Think this is because comments are fetched through an AJAX call to the server.

Hence, when I use jsoup, it doesn't fetch the comments.

So how can I automate the process of fetching posts and comments?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:08:24+0000

Jsoup is a html parser only. Unfortunately it's not possible to parse any javascript / ajax content, since jsoup can't execute those.

The solution: using a library which can handle Scripts.

Here are some examples i know:

If such a library doesn't support parsing or selectors, you can at least use them to get Html out of the scripts (which then can be parsed by jsoup).

Categories

jsoup - Fetch contents(loaded through AJAX call) of a web page

jsoup - Fetch contents(loaded through AJAX call) of a web page

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags