Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
231 views
in Technique[技术] by (71.8m points)

python - Twitter likes scraper not working properly

been trying to get a python scraper from a tutorial to download all my liked media on twitter (for reference images and videos ive forgotten to download) but when running its not returning any information nor media unlike other websites ive used it on, and i cant tell whats tripping it up. is there any way to fix this? this is the code im currently using:

import os
import requests as r
from bs4 import BeautifulSoup


# Request data from url
request = r.get('my twitter url')
soup = BeautifulSoup(request.text, "html.parser")

# source the images link which is to be downloaded
x = soup.select('img[src^="https://pbs.twimg.com/media/"]')

# generate links from the which the images are to be downloaded
links = []
for img in x:
    links.append(img['src'])

# Create directory where the downloaded images are to be written
path = 'photos'
isDir = os.path.isdir(path)
if isDir:
    print('Required directory is already available. Skipping folder creation..
')
else:
    print('Creating a directory
')
    os.mkdir('photos')

# Generate and save only up to 10 images to test code
i = 1
for index, img_link in enumerate(links):
    if i <= 10:
        print(f'Generating file {i}.jpg')
        img_data = r.get(img_link).content
        with open("photos/" + str(index + 1) + '_' + '.jpg', 'wb+') as f:
            f.write(img_data)
        i += 1
    else:
        break
question from:https://stackoverflow.com/questions/65838148/twitter-likes-scraper-not-working-properly

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Twitter posts are displayed by JavaScript. Using requests.get will not load any dynamic content or run any JavaScript. (Hint: use view-source on a webpage to see what requests get, not the inspector)

You may want to look into something like selenium instead, which is a browser automation tool that will load webpages using a browser so all the dynamic content and javascript should work the same as you see it in your own browser.

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('my twitter url')

soup = BeautifulSoup(driver.page_source, 'html.parser')
...

Note that you'll need to install selenium and an acconpanying webdriver for the browser you want (e.g chromedriver for chrome, geckodriver for firefox, etc)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...