Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
240 views
in Technique[技术] by (71.8m points)

web scraping - Google Sheets ImportXML function assistance needed

and THANK YOU in advance for any help you can provide!

So... 90% of the time ImportXML seems to work just fine for me, but now I'm struggling with the below 2 cases... I don't know if they are all the same problem or not, or if they are 2 different problems.

Any help appreciated!!!

CASE ONE - YAHOO

  1. Go to this page: https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL
  2. The number I want to pull to my spreadsheet is "Free Cash Flow"

My first attempt:

=IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//*[@id='Col1-1-Financials-Proxy']/section/div[4]/div[1]/div[1]/div[2]/div[12]/div[1]/div[2]/span")

My second attempt:

=IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[2]/div/div/section/div[4]/div[1]/div[1]/div[2]/div[12]/div[1]/div[2]/span")

My third attempt:

=INDEX(IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//div[@class='Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(140px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor) D(tbc)']"),1,1)

My fourth attempt:

=INDEX(IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//span[@data-reactid='277']"),1,1)

Nothing I do seems to work... please help!

CASE TWO - MSN

  1. Go to this page: https://www.msn.com/en-gb/money/stockdetails/analysis/fi-a1mou2
  2. Click the "Price Ratios" link
  3. The number I want to pull to my spreadsheet is "P/E Ratio 5-Year Low"

My first attempt:

=IMPORTXML("https://www.msn.com/en-us/money/stockdetails/analysis/nas-aapl/fi-a1mou2","//*[@id='main']/div[2]/div[2]/div[2]/div/div[3]/div/div/div[5]/div[1]/div[2]/div[4]/div[1]/div/div/div/ul[3]/li[2]/span[1]/p")

I only tried once with this case because I suspect that the number sitting on an internal page tab might be causing the issue? Help?

ANY solutions that automatically will pull the above two numbers into my spreadsheet are welcome, I'm open to workarounds with scripts/macros if ImportXML just isn't able to do it.

Thank you so much to all contributors who make this community so great!

You are all unsung heroes!

question from:https://stackoverflow.com/questions/65931088/google-sheets-importxml-function-assistance-needed

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The reason why you can't get the data in MSN is because the specific element you have mentioned has been inserted dynamically on the website. IMPORTXML can only retrieve static content of a website and therefore, it will not be able to retrieve this dynamic content.

To check which content is static and which is dynamic, you can disable Javscript on your browser (as JS is the responsible of inserting dynamic content) and reload the page : the remaining content is the one you can access with IMPORTXML. In the website you provided if you follow these indications you will see how if you click on Price Ratios nothing will change as this content is not static. This is a simple guide on how to disable Javascript in Chrome.

Therefore, you will need to find an alternative method to scrape dynamic data.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...