python - How do I get scraped elements into a single list BeautifulSoup?

Question

Welcome To Ask or Share your Answers For Others

python - How do I get scraped elements into a single list BeautifulSoup?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How do I get scraped elements into a single list BeautifulSoup?

I would like the result to be a single list with individual strings, not the current output. Basically it would be the last list with all the strings in one list together. Any help would be appreciated

headers = dict()
headers[
    "User-Agent"
] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36"

headlines =[]
pages = np.arange(1, 3)

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)
     print(headlines)

Then the output is this:

['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow', 'Apple Inc. stock outperforms market on strong trading day']

question from:https://stackoverflow.com/questions/65906663/how-do-i-get-scraped-elements-into-a-single-list-beautifulsoup

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:13:09+0000

What happens?

The headlines are all in that list, issue is the indent of your print, it should be outside the loop and print the list only ones.

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)

print(headlines)

Btw you can improve your selection like this:

soup = BeautifulSoup(results.text, "html.parser")
for headline in soup.select('div.article__content h3.article__headline'):
    headlines.append(headline.get_text(strip=True))

Categories

python - How do I get scraped elements into a single list BeautifulSoup?

python - How do I get scraped elements into a single list BeautifulSoup?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

What happens?

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags