Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
449 views
in Technique[技术] by (71.8m points)

twitter - Can I directly extract contents of a tweet using google sheet formulas?

=INDEX(IMPORTXML(B20,”//a[@Class=‘Css-4rbku5 etc etc’]”),1)

So I used this sorta command to get the headline of reddit threads, I was hoping similarly Tweets’ content, the text can b extracted as well. But I keep getting error “Imported Content is Empty”

Appreciate if this can be done w/o APIs n such. Thanx

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Twitter is dynamically generated

This means that the HTML that is loaded when you request the HTML does not contain any tweets. Once the basic HTML has loaded, then the tweets begin to get populated into the page via JavaScript.

Since this is the case, any HTML you recieve from IMPORTXML will not have any tweets. Twitter itself tries to limit any scraping that is not done via the API.

So unfortunately your best recourse is to either use the API (it has a free tier), or learn to use some web automation software such as Puppeteer, which can emulate a person who visits the site, waits for a second to let the content load, and then scrape the data.

Disclaimer: It is possible that IMPORTXML will work using the css class as you have done, though this will be very unreliable. The CSS classes are also auto generated, so they will be changing very often.

Demonstration

Using curl on the command line to fetch the raw HTML of https://twitter.com/Twitter/status/1380306486962782208 (before any JavaScript changes):

curl https://twitter.com/Twitter/status/1380306486962782208 >> twitter.html

Examining the resulting twitter.html file:

<!DOCTYPE html>
<html dir="ltr" lang="en">
   <meta charset="utf-8" />
   <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,viewport-fit=cover" />
   <link rel="preconnect" href="//abs.twimg.com" />
   <link rel="dns-prefetch" href="//abs.twimg.com" />
   <link rel="preconnect" href="//api.twitter.com" />
   <link rel="dns-prefetch" href="//api.twitter.com" />
   <link rel="preconnect" href="//pbs.twimg.com" />
   <link rel="dns-prefetch" href="//pbs.twimg.com" />
   <link rel="preconnect" href="//t.co" />
   <link rel="dns-prefetch" href="//t.co" />
   <link rel="preconnect" href="//video.twimg.com" />
   <link rel="dns-prefetch" href="//video.twimg.com" />
   <link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web-legacy/polyfills.98da7185.js" nonce="MjE5MTk0YTItNjQxMy00NzhjLWE0ZWEtNTA0NzEwMzdkNmQy" />
   <link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web-legacy/vendors~main.6fa4fac5.js" nonce="MjE5MTk0YTItNjQxMy00NzhjLWE0ZWEtNTA0NzEwMzdkNmQy" />
   <link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web-legacy/i18n/en.2eb8dfe5.js" nonce="MjE5MTk0YTItNjQxMy00NzhjLWE0ZWEtNTA0NzEwMzdkNmQy" />
   <link rel="preload" as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/client-web-legacy/main.88d8e8e5.js" nonce="MjE5MTk0YTItNjQxMy00NzhjLWE0ZWEtNTA0NzEwMzdkNmQy" />
   <meta property="fb:app_id" content="2231777543" />
   <meta property="og:site_name" content="Twitter" />
   <meta name="google-site-verification" content="acYOOcR5z6puMzLn6hLDZI1nNHXPxt57OIstz1vnCV0" />
   <meta name="facebook-domain-verification" content="x6sdcc8b5ju3bh8nbm59eswogvg6t1" />
   <link rel="manifest" href="/manifest.json" crossOrigin="use-credentials" />
   <link rel="alternate" hreflang="x-default" href="https://twitter.com/twitter/status/1380306486962782208" />
   <link rel="alternate" hreflang="en" href="https://twitter.com/twitter/status/1380306486962782208?lang=en" />
   <link rel="alternate" hreflang="ar" href="https://twitter.com/twitter/status/1380306486962782208?lang=ar" />
   <link rel="alternate" hreflang="ar-x-fm" href="https://twitter.com/twitter/status/1380306486962782208?lang=ar-x-fm" />
   <link rel="alternate" hreflang="bg" href="https://twitter.com/twitter/status/1380306486962782208?lang=bg" />
   <link rel="alternate" hreflang="bn" href="https://twitter.com/twitter/status/1380306486962782208?lang=bn" />
   <link rel="alternate" hreflang="ca" href="https://twitter.com/twitter/status/1380306486962782208?lang=ca" />
   <link rel="alternate" hreflang="cs" href="https://twitter.com/twitter/status/1380306486962782208?lang=cs" />
   <link rel="alternate" hreflang="da" href="https://twitter.com/twitter/status/1380306486962782208?lang=da" />
   <link rel="alternate" hreflang="de" href="https://twitter.com/twitter/status/1380306486962782208?lang=de" />
   <link rel="alternate" hreflang="el" href="https://twitter.com/twitter/status/1380306486962782208?lang=el" />
   <link rel="alternate" hreflang="en-GB" href="https://twitter.com/twitter/status/1380306486962782208?lang=en-GB" />
   <link rel="alternate" hreflang="en-ss" href="https://twitter.com/twitter/status/1380306486962782208?lang=en-ss" />
   <link rel="alternate" hreflang="en-xx" href="https://twitter.com/twitter/status/1380306486962782208?lang=en-xx" />
   <link rel="alternate" hreflang="es" href="https://twitter.com/twitter/status/1380306486962782208?lang=es" />
   <link rel="alternate" hreflang="eu" href="https://twitter.com/twitter/status/1380306486962782208?lang=eu" />
   <link rel="alternate" hreflang="fa" href="https://twitter.com/twitter/status/1380306486962782208?lang=fa" />
   <link rel="alternate" hreflang="fi" href="https://twitter.com/twitter/status/1380306486962782208?lang=fi" />
   <link rel="alternate" hreflang="fil" href="https://twitter.com/twitter/status/1380306486962782208?lang=fil" />
   <link rel="alternate" hreflang="fr" href="https://twitter.com/twitter/status/1380306486962782208?lang=fr" />
   <link rel="alternate" hreflang="ga" href="https://twitter.com/twitter/status/1380306486962782208?lang=ga" />
   <link rel="alternate" hreflang="gl" href="https://twitter.com/twitter/status/1380306486962782208?lang=gl" />
   <link rel="alternate" hreflang="gu" href="https://twitter.com/twitter/status/1380306486962782208?lang=gu" />
   <link rel="alternate" hreflang="he" href="https://twitter.com/twitter/status/1380306486962782208?lang=he" />
   <link rel="alternate" hreflang="hi" href="https://twitter.com/twitter/status/1380306486962782208?lang=hi" />
   <link rel="alternate" hreflang="hr" href="https://twitter.com/twitter/status/1380306486962782208?lang=hr" />
   <link rel="alternate" hreflang="hu" href="https://twitter.com/twitter/status/1380306486962782208?lang=hu" />
   <link rel="alternate" hreflang="id" href="https://twitter.com/twitter/status/1380306486962782208?lang=id" />
   <link rel="alternate" hreflang="it" href="https://twitter.com/twitter/status/1380306486962782208?lang=it" />
   <link rel="alternate" hreflang="ja" href="https://twitter.com/twitter/status/1380306486962782208?lang=ja" />
   <link rel="alternate" hreflang="kn" href="https://twitter.com/twitter/status/1380306486962782208?lang=kn" />
   <link rel="alternate" hreflang="ko" href="https://twitter.com/twitter/status/1380306486962782208?lang=ko" />
   <link rel="alternate" hreflang="mr" href="https://twitter.com/twitter/status/1380306486962782208?lang=mr" />
   <link rel="alternate" hreflang="ms" href="https://twitter.com/twitter/status/1380306486962782208?lang=ms" />
   <link rel="alternate" hreflang="nb" href="https://twitter.com/twitter/status/1380306486962782208?lang=nb" />
   <link rel="alternate" hreflang="nl" href="https://twitter.com/twitter/status/1380306486962782208?lang=nl" />
   <link rel="alternate" hreflang="pl" href="https://twitter.com/twitter/status/1380306486962782208?lang=pl" />
   <link rel="alternate" hreflang="pt" href="https://twitter.com/twitter/status/1380306486962782208?lang=pt" />
   <link rel="alternate" hreflang="ro" href="https://twitter.com/twitter/status/1380306486962782208?lang=ro" />
   <link rel="alternate" hreflang="ru" href="https://twitter.com/twitter/status/1380306486962782208?lang=ru" />
   <link rel="alternate" hreflang="sk" href="https://twitter.com/twitter/status/1380306486962782208?lang=sk" />
   <link rel="alternate" hreflang="sr" href="https://twitter.com/twitter/status/1380306486962782208?lang=sr" />
   <link rel="alternate" hreflang="sv" href="https://twitter.com/twitter/status/1380306486962782208?lang=sv" />
   <link rel="alternate" hreflang="ta" href="https://twitter.com/twitter/status/1380306486962782208?lang=ta" />
   <link rel="alternate" hreflang="th" href="https://twitter.com/twitter/status/1380306486962782208?lang=th" />
   <link rel="alternate" hreflang="tr" href="https://twitter.com/twitter/status/1380306486962782208?lang=tr" />
   <link rel="alternate" hreflang="uk" href="https://twitter.com/twitter/status/1380306486962782208?lang=uk" />
   <link rel="alternate" hreflang="ur" href="https://twitter.com/twitter/status/1380306486962782208?lang=ur" />
   <link rel="alternate" hreflang="vi" href="https://twitter.com/twitter/status/1380306486962782208?lang=vi" />
   <link rel="alternate" hreflang="zh" href="https://twitter.com/twitter/status/1380306486962782208?lang=zh" />
   <link rel="alternate" hreflang="zh-Hant" href="https://twitter.com/twitter/status/1380306486962782208?lang=zh-Hant" />
   <link rel="canonical" href="https://twitter.com/twitter/status/1380306486962782208" />
   <link rel="search" type="application/opensearchdescription+xml" href="/opensearch.xml" title="Twitter">
   <link rel="mask-icon" size

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...