Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
182 views
in Technique[技术] by (71.8m points)

python - How do I get scraped elements into a single list BeautifulSoup?

I would like the result to be a single list with individual strings, not the current output. Basically it would be the last list with all the strings in one list together. Any help would be appreciated

headers = dict()
headers[
    "User-Agent"
] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36"

headlines =[]
pages = np.arange(1, 3)

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)
     print(headlines)

Then the output is this:

['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow', 'Apple Inc. stock outperforms market on strong trading day']


question from:https://stackoverflow.com/questions/65906663/how-do-i-get-scraped-elements-into-a-single-list-beautifulsoup

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

What happens?

The headlines are all in that list, issue is the indent of your print, it should be outside the loop and print the list only ones.

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)

print(headlines)

Btw you can improve your selection like this:

soup = BeautifulSoup(results.text, "html.parser")
for headline in soup.select('div.article__content h3.article__headline'):
    headlines.append(headline.get_text(strip=True))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...