There many option to do it, here is 3 most common (CSS selector, regex and lambda):
(有很多选项可以做到,这是3种最常见的选项(CSS选择器,正则表达式和lambda):)
data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/">TAG 2</a>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
# 1st option - CSS selector
print(soup.select_one('a[href^="https://instagram"]'))
# 2nd option - using regexp
import re
print(soup.find('a', {'href': re.compile(r'^https://instagram')}))
# 3rd option - using lambda
print(soup.find(lambda tag: 'href' in tag.attrs and tag['href'].startswith('https://instagram')))
Prints:
(印刷品:)
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
EDIT: To select multiple links that starts with some string:
(编辑:要选择以某些字符串开头的多个链接:)
data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/A">TAG 2</a>
<a href="https://facebook.com/">TAG 3</a>
<a href="https://instagram.com/B">TAG 4</a>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
for link in soup.select('a[href^="https://instagram"]'):
print(link)
Prints:
(印刷品:)
<a href="https://instagram.com/A">TAG 2</a>
<a href="https://instagram.com/B">TAG 4</a>
For CSS Selector reference use this link .
(对于CSS选择器参考,请使用此链接 。)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…