Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

Question

Welcome To Ask or Share your Answers For Others

Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

I'm referencing this url: https://tracker.icon.foundation/block/29562412

If you scroll down to "Transactions", it shows 2 transactions with separate links, that's essentially what I'm trying to grab. If I try a simple pd.read_csv(url) command, it clearly omits the data I'm looking for, so I thought it might be JavaScript based and tried the following code instead:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://tracker.icon.foundation/block/29562412')
r.html.links
r.html.absolute_links

and I get the result "set()" even though I was expecting the following:

['https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632', 'https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e']

Is JavaScript even the right approach? I tried BeautifulSoup instead and found no cigar on that end as well.

question from:https://stackoverflow.com/questions/65836519/webscraping-blockchain-data-seemingly-embedded-in-javascript-through-python-is

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:35:15+0000

You're right. This page is populated asynchronously using JavaScript, so BeautifulSoup and similar tools won't be able to see the specific content you're trying to scrape.

However, if you log your browser's network traffic, you can see some (XHR) HTTP GET requests being made to a REST API, which serves its results in JSON. This JSON happens to contain the information you're looking for. It actually makes several such requests to various API endpoints, but the one we're interested in is called txList (short for "transaction list" I'm guessing):

def main():

    import requests

    url = "https://tracker.icon.foundation/v3/block/txList"

    params = {
        "height": "29562412",
        "page": "1",
        "count": "10"
    }

    response = requests.get(url, params=params)
    response.raise_for_status()

    base_url = "https://tracker.icon.foundation/transaction/"

    for transaction in response.json()["data"]:
        print(base_url + transaction["txHash"])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output:

https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632
https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e
>>>

Categories

Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

Webscraping Blockchain data seemingly embedded in Javascript through Python, is this even the right approach?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags