python-3.x - 刮Python a(Scraping Python a)

Question

Welcome To Ask or Share your Answers For Others

python-3.x - 刮Python a(Scraping Python a)

asked Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python-3.x - 刮Python a(Scraping Python a)

I have 2 tags with different contents inside the href tag and I just want one I was wondering if it is possible for BeautifulSoup to be able to select only the href that starts with a particular word.

(我在href标记内有2个具有不同内容的标记，我只想问一个问题，BeautifulSoup是否有可能只能选择以特定单词开头的href。)

If I Know Thank You.

(如果我知道，谢谢。)

<a href="https://facebook.com/" </a>

and the other

(和另一个)

<a href="https://Instagram.com/" </a>

ask by Jacksuel Soares Braga translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-03-06T04:20:59+0000

There many option to do it, here is 3 most common (CSS selector, regex and lambda):

(有很多选项可以做到，这是3种最常见的选项（CSS选择器，正则表达式和lambda）：)

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/">TAG 2</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

# 1st option - CSS selector
print(soup.select_one('a[href^="https://instagram"]'))

# 2nd option - using regexp
import re
print(soup.find('a', {'href': re.compile(r'^https://instagram')}))

# 3rd option - using lambda
print(soup.find(lambda tag: 'href' in tag.attrs and tag['href'].startswith('https://instagram')))

Prints:

(印刷品：)

<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>

EDIT: To select multiple links that starts with some string:

(编辑：要选择以某些字符串开头的多个链接：)

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/A">TAG 2</a>
<a href="https://facebook.com/">TAG 3</a>
<a href="https://instagram.com/B">TAG 4</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

for link in soup.select('a[href^="https://instagram"]'):
    print(link)

Prints:

(印刷品：)

<a href="https://instagram.com/A">TAG 2</a>
<a href="https://instagram.com/B">TAG 4</a>

For CSS Selector reference use this link .

(对于CSS选择器参考，请使用此链接。)

Categories

python-3.x - 刮Python a(Scraping Python a)

python-3.x - 刮Python a(Scraping Python a)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags