Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
557 views
in Technique[技术] by (71.8m points)

selenium - scraping python : existing connection was forcibly closed by the remote host

im scraping from this website https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx to get kurs price table.it looks like im getting blocked by that website. enter image description here enter image description here with selenium and bs4,but when im trying to get the table i got error like this

ChunkedEncodingError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

and this is my code

driver = webdriver.Chrome()
driver.get("https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx")

wait = WebDriverWait(driver, 10)

driver.implicitly_wait(10) #secs


# click "usd"

book = wait.until(EC.element_to_be_clickable((By.ID,"selectPeriod")))
sel = Select(book)
sel.select_by_value("range")

bookk = wait.until(EC.element_to_be_clickable((By.ID,"ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_ddlmatauang1")))
sel = Select(bookk)
sel.select_by_value("USD  ")

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

start_date = driver.find_element_by_id("ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_txtFrom")
start_date.send_keys("20-Nov-15")
end_date = driver.find_element_by_id("ctl00_PlaceHolderMain_g_6c89d4ad_107f_437d_bd54_8fda17b556bf_ctl00_txtTo")
end_date.send_keys(time.strftime("%d-%m-%Y"))

time.sleep(5)
buttons = driver.find_elements_by_xpath("//input[@value='Cari']")
buttons[1].click()

src = driver.page_source # gets the html source of the page
headers = {
    #"Referer": "https://id.investing.com/commodities/gold-historical-data",
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0",
    "X-Requested-With": "XMLHttpRequest"
}
parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"
url = "https://www.bi.go.id/id/statistik/informasi-kurs/transaksi-bi/Default.aspx"
r = requests.get(url, headers=headers)
html = r.text
table = parser.find("table", attrs={"class" : "table1"}) # A list of attributes that you want to check in a tag)
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])
    
result = pd.DataFrame(data, columns=['nilai', 'kurs_jual', 'kurs_beli', 'tanggal'])
result.to_csv("kurs1.csv", index=False)

df = pd.read_csv("kurs1.csv")
pd.set_option('display.max_rows', df.shape[0]+1)
print(df)

what should i do? please help me,actually a month ago i already succeed but suddenly the class,id from that website changed so i have to change it all.when i try to run it again i got connection error.im stuck since weeks ago! thank you in advance the csv file shouldve been like this enter image description here

question from:https://stackoverflow.com/questions/65894976/scraping-python-existing-connection-was-forcibly-closed-by-the-remote-host

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...