urllib - How to extract tables from websites in Python

Question

Welcome To Ask or Share your Answers For Others

urllib - How to extract tables from websites in Python

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

urllib - How to extract tables from websites in Python

Here,

http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500

There is a table. My goal is to extract the table and save it to a csv file. I wrote a code:

import urllib
import os

web = urllib.urlopen("http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500")

s = web.read()
web.close()

ff = open(r"D:expython_exurlliboutput.txt", "w")
ff.write(s)
ff.close()

I lost from here. Anyone who can help on this? Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:13:34+0000

Pandas can do this right out of the box, saving you from having to parse the html yourself. to_html() extracts all tables from your html and puts them in a list of dataframes. to_csv() can be used to convert each dataframe to a csv file. For the web page in your example, the relevant table is the last one, which is why I used df_list[-1] in the code below.

import requests
import pandas as pd

url = 'http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')

It's simple enough to do in one line, if you prefer:

pd.read_html(requests.get(<url>).content)[-1].to_csv(<csv file>)

P.S. Just make sure you have lxml, html5lib, and BeautifulSoup4 packages installed in advance.

Categories

urllib - How to extract tables from websites in Python

urllib - How to extract tables from websites in Python

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags