Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
84 views
in Technique[技术] by (71.8m points)

How to Get data-* attributes when web scraping using python requests (Python Requests Creating Some Issues)

How can I get the value of data-d1-value when I am using requests library of python?

The request.get(URL) function is itself not giving the data-* attributes in the div which are present in the original webpage.

The web page is as follows:

<div id="test1" class="class1" data-d1-value="150">
180
</div>

The code I am using is :

req = request.get(url)
soup = BeautifulSoup(req.text, 'lxml')
d1_value = soup.find('div', {'class':"class1"})
print(d1_value)

The result I get is:

<div id="test1" class="class1">
180
</div>

When I debug this, I found that request.get(URL) is not returning the full div but only the id and class and not data-* attributes.

How should I modify to get the full value?

For better example: For my case the URL is: https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG

And the Information of variable: The DIV CLASS is : class="inprice1 nsecp" and The value of data-numberanimate-value is what I am trying to fetch

Thanks in advance :)

question from:https://stackoverflow.com/questions/65843623/how-to-get-data-attributes-when-web-scraping-using-python-requests-python-req

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

EDIT

Website response differs in case of requesting it - In your case using requests the value you are looking for is served in this way:

<div class="inprice1 nsecp" id="nsecp" rel="92.75">92.75</div>

So you can get it from the rel or from the text:

soup.find('div', {'class':"inprice1"})['rel']
soup.find('div', {'class':"inprice1"}).get_text()

Example

import requests
from bs4 import BeautifulSoup

req = requests.get('https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG')

soup = BeautifulSoup(req.text, 'lxml')

print('rel: '+soup.find('div', {'class':"inprice1"})['rel'])
print('text :'+soup.find('div', {'class':"inprice1"}).get_text())

Output

rel: 92.75
text: 92.75

To get a response that display the source as you inspect it, you have to try selenium

Example

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

driver = webdriver.Chrome(executable_path='C:Program FilesChromeDriverchromedriver.exe')
url = "https://www.moneycontrol.com/india/stockpricequote/oil-drillingexploration/oilnaturalgascorporation/ONG"

driver.get(url)
sleep(2)

soup = BeautifulSoup(driver.page_source, "lxml")
print(soup.find('div', class_='inprice1 nsecp')['data-numberanimate-value'])
driver.close()

To get the attribute value just add ['data-d1-value'] to your find()

Example

from bs4 import BeautifulSoup

html='''
<div id="test1" class="class1" data-d1-value="150">
180
</div>
'''

soup = BeautifulSoup(html, 'lxml')
d1_value = soup.find('div', {'class':"class1"})['data-d1-value']
print(d1_value)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...