Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

python - How to scrape iframe using selenium?

I want to extract all comment in a website. The website using iframe for the comment section. I already tried to scrape it using selenium. but unfortunaly, i just can scrape 1 comment. How to scrape the rest of the comment and archive it to csv or xmls?

  • Code :
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    page = driver.get("https://finance.detik.com/berita-ekonomi-bisnis/d-5307853/ri-disebut-punya-risiko-korupsi-yang-tinggi?_ga=2.13736693.357978333.1608782559-293324864.1608782559")
    
    iframe = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, "//iframe[@class='xcomponent-component-frame xcomponent-visible']")))
    driver.switch_to.frame(iframe)
    
    xpath = '//*[@id="cmt66363941"]/div[1]/div[1]'
    extract_name = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, xpath)))
    username=extract_name.text
    
    xpath = '//*[@id="cmt66363941"]/div[1]/div[2]'
    extract_comment = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, xpath)))
    comment=extract_comment.text
    
    print(username, comment)
  • Output
    King Akbarmachinery
    3 hari yang lalu selama korupsi tidak dihukum mati disanalah korupsi masih liar dan ada kalaupun dibuat hukum mati setidaknya bisa mengurangi angka korupsi itu
    Laporkan
    2BalasBagikan:

by the way, how to erase this line from the output ?

Laporkan
2BalasBagikan:

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You should generalize your paths in order to grab all the users and all comments at the same time. You can grab all the comments and all the users using presence_of_all_elements_located

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
page = driver.get(
    "https://finance.detik.com/berita-ekonomi-bisnis/d-5307853/ri-disebut-punya-risiko-korupsi-yang-tinggi?_ga=2.13736693.357978333.1608782559-293324864.1608782559")

iframe = WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.XPATH, "//iframe[@class='xcomponent-component-frame xcomponent-visible']")))
driver.switch_to.frame(iframe)

xpath_users = "//div[contains(@class, 'comment__cmt_dk_name___EGuzI ')]"
extract_names = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, xpath_users)))

xpath_comments = "//div[contains(@class, 'comment__cmt_box_text')]"
extract_comments = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, xpath_comments)))

for user, comment in zip(extract_names, extract_comments):
    user = user.text.split("
")[0]
    comment = comment.text.split("
")[0]
    print(user, comment)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...