Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

web scraping - how to get tbody from table from python beautiful soup ?

I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals: I'm using the code below:

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

With the above code, I was able to get first & thrid column just fine. But when I use the same code with http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals, It could not find tbody as its element, but I can see the tbody when I inspect the element.

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    soup.findAll('table')[2].tbody.findAll('tr')
    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

Here's what I got from comment error:

'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
      8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'

'
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you are inspecting through the inspect tool in the browser it will insert the tbody tags.

The source code, may, or may not contain them. I suggest looking at the source view if you really want to know.

Either way, you do not need to traverse to the tbody, simply:

soup.findAll('table')[0].findAll('tr') should work.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...