Scraping links using bs4 and python

Question

Welcome To Ask or Share your Answers For Others

Scraping links using bs4 and python

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Scraping links using bs4 and python

I am looking to parse links out of a website using bs4. I was trying to avoid using regex.

def generate_url(day, year, month):
   url = f"http://hockey-reference.com/boxscores/?year={year}&month={month}&day={day}"
   page = requests.get(url)
   soup = BeautifulSoup(page.content, 'lxml')
   return soup

soup = generate_url(13,2021,1)
html_links = soup.find_all('td', class_ = 'right gamelink')

My result is a list with the html embedded...

[<td class="right gamelink">
<a href="/boxscores/202101130COL.html">F<span class="no_mobile">inal</span></a>
</td>,
<td class="right gamelink">
<a href="/boxscores/202101130EDM.html">F<span class="no_mobile">inal</span></a>
</td>,
<td class="right gamelink">
<a href="/boxscores/202101130PHI.html">F<span class="no_mobile">inal</span></a>
</td>,
<td class="right gamelink">
<a href="/boxscores/202101130TBL.html">F<span class="no_mobile">inal</span></a>
</td>,
<td class="right gamelink">
<a href="/boxscores/202101130TOR.html">F<span class="no_mobile">inal</span></a>
</td>]

What are the best ways to extract these links?

question from:https://stackoverflow.com/questions/65865081/scraping-links-using-bs4-and-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:25:55+0000

Append your code with iterating through html_links and getting href from them:

url = 'http://hockey-reference.com'
for html_link in html_links:
    link = html_link.findChild('a')['href']
    print(url + link)

Categories

Scraping links using bs4 and python

Scraping links using bs4 and python

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags