Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
739 views
in Technique[技术] by (71.8m points)

selenium - BeautifulSoup Python NoneType object has no attribute 'text'

I'm trying to scrape a javascript loaded website https://e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias by using selenium and beautifulsoup 4.

However, when trying to retrieve an element or subitem (a sub-branch) from the tree, i get this error

bloquefecha=bloque.find('div[@class="date"]').text

AttributeError: 'NoneType' object has no attribute 'text'

i'm attaching HERE a snapshot of my code and the developers console for illustrative purposes

Here is my code:

def beautifulseleniumsunat2():
navegador = webdriver.Chrome()
navegador.get("https://e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias")
time.sleep(7)  # esperamos 7 segundos a que cargue la pagina
pagsunat = navegador.page_source
soup = BeautifulSoup(pagsunat, "html.parser")
print (soup.prettify())

bloquesdias2 = soup.select('td[class*="table-bordered calendar-day current"]')
listafecha = []
listacompra=[]
listaventa=[]
for bloque in bloquesdias2:
    bloquefecha=bloque.find('div[@class="date"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
    listafecha.append(bloquefecha.text)
    bloquecompra=bloque.find('div[@class="event normal-all-day begin end"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
    listacompra.append(bloquecompra.text)
    bloqueventa = bloque.find('div[@class="event pap-all-day begin end"]') #ALSO tried with findall and iterating with FOR loop on each element but ERROR says it's not iterable
    listaventa.append(bloquecompra.text)

listafinal=[listacompra,listaventa,listafecha]
print (listafinal)
question from:https://stackoverflow.com/questions/65878205/beautifulsoup-python-nonetype-object-has-no-attribute-text

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What happens?

As mentioned by Aziz Sonawalla you have to pass the class as separat argument to find() but that wont fix all your issues. Cause if elements not available it will raise an error again e.g. if there is no compra / ventra entry.

How to fix that ?

You have to fetch the error - try will give you the result if there is no error except will set result to empty string.

try:
        bloquecompra = day.select_one('div[class*="normal-all-day"]').get_text().split()[1]
    except:
        bloquecompra = ''

Example

You can replace all your code after print (soup.prettify()):

data = []

for day in soup.select('table.calendar-table.table.table-condensed > tbody td[class*="current"]'):
    bloquefecha = day.select_one('div.date').get_text()
    try:
        bloquecompra = day.select_one('div[class*="normal-all-day"]').get_text().split()[1]
    except:
        bloquecompra = ''
    
    try:
        bloqueventa = day.select_one('div[class*="pap-all-day"]').get_text().split()[1]
    except:
        bloqueventa = ''
    
    data.append(';'.join([bloquefecha,bloquecompra,bloqueventa]))
data    

Output

['1;3.618;3.624',
 '2;;',
 '3;;',
 '4;;',
 '5;3.624;3.628',
 '6;3.627;3.631',
 '7;3.625;3.630',
 '8;3.620;3.623',
 '9;3.610;3.615',
 '10;;',
 '11;;',
 '12;3.615;3.618',
 '13;3.606;3.608',
 '14;3.610;3.615',
 '15;3.610;3.613',
 '16;3.610;3.614',
 '17;;',
 '18;;',
 '19;3.609;3.617',
 '20;3.611;3.615',
 '21;3.612;3.615',
 '22;3.618;3.622',
 '23;;',
 '24;;',
 '25;;',
 '26;;',
 '27;;',
 '28;;',
 '29;;',
 '30;;',
 '31;;']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...