Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
114 views
in Technique[技术] by (71.8m points)

javascript - This website keep send me back to homepage when i open developer tools or when I use selenium, and return Unknow Respone when i use requests?

From past 3 month or so i learn python and do some web scraping. Right now my main goal is to automate downloading comic(manga) image from this website using requests/BeautifulSoup4/Selenium module. But when I open let say this page of that website and try open Developer tool, the page automatically send me back to homepage . This particular website use very heavy JavaScript(I guess).

My FIRST attempt is open that page using chrome and open Developer tool but it fail like I describe above. SECOND attempt is open the page and try to pause JavaScript pressing F8 and/ pressing ctrl+ and/ pressing ctrl but none of those working. Third attempt I try with different browser like Chrome canary and Microsoft edge cause i read that those browser prevent JavaScript to load/open new tab(or something) but again when I open Developer tool the page send me back to the homepage. FOURTH attempt, I try to save the page(press ctrl + s not working, so I have to go Three dot at chrome>More tools>Save page as(html and pdf)) but when I open it, the comic image disappear/not get downloaded. FIFTH I try to download the page using requests and BeautifulSoup4 do the rest

import requests 
from bs4 import BeautifulSoup as bf
R = requests.get('https://mangaku.pro/one-piece-chapter-1000-5/')
soup = bf(R.text, 'lxml')
print(soup)

but here the result:

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]--><!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]--><!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]--><!--[if gt IE 8]><!--><html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<meta content="0" http-equiv="refresh"/>
<title>mangaku.pro | 520: Web server is returning an unknown error</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
...

SIXTH attempt I try to use selenium then grab the page source and then BeautifulSoup4 do the rest. But...

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://mangaku.pro/one-piece-chapter-1000-5/")

even when the page doesn't finish load yet. It send the driver TO THE HOMEPAGE again.

I can get the image by Screenshot the page using Chrome extension but after that I need to crop the image(manually), but it's not efficient way and far from my main goal which is to automate all of that process

Thank you so much. I really appreciate any help!

question from:https://stackoverflow.com/questions/65623568/this-website-keep-send-me-back-to-homepage-when-i-open-developer-tools-or-when-i

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...