I am trying to scrape data from html tables with python using pandas. These tables are on urls, so i create a list
Each table has two values on some cells on specific columns. I manage to read all the data, print them and save them on a csv file. This is the way i do it.
My code so far is
# -*- coding: utf-8 -*-
import pandas as pd
urls = ["https://url?date=2020-12-31", "https://url?date=2020-12-30", "https://url?date=2020-12-29"]
df = pd.DataFrame(urls)
for url in urls:
df = pd.read_html(url, parse_dates=True)
print(df[0])
df[0].to_csv('file.csv', encoding='utf-8', mode='a', header=False, index=False)
print ("Data have been extracted successfully")
On the output, the two values of some cells they appear in a line e.g € 14,720 55.3%.
As you see, i have an amount and a percentage on the same line, plus some empty columns appearing with NaN.
I want to separate the amount with the percentage by the second space, and transfer the percentage in a new column (Percentage) next to this.
Im trying with str.split but i get error message that "Dataframe object has no attribute list".
Also other methods i try i get same error that Dataframe object has no attribute.....
i convert the urls list to Dataframe with
df = pd.DataFrame(urls)
but i still dont understant if this is the way to convert because it keeps giving me the error.
Also when i try to delete the empty columns with
df.drop(df.columns[[0,1]], axis=1)
AttributeError: 'list' object has no attribute 'drop'
i get the same message.
So, two things. How i separate by second space, values from specific column and put them in a new column next and then it would be cool if i could drop the empty columns. Either by column number or by empty cells.
Thanks
question from:
https://stackoverflow.com/questions/65866460/split-values-in-column-by-column-number-with-pandas 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…