Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
68 views
in Technique[技术] by (71.8m points)

python - Split values in column by column number with pandas

I am trying to scrape data from html tables with python using pandas. These tables are on urls, so i create a list

Each table has two values on some cells on specific columns. I manage to read all the data, print them and save them on a csv file. This is the way i do it. My code so far is

# -*- coding: utf-8 -*- 
import pandas as pd
urls = ["https://url?date=2020-12-31", "https://url?date=2020-12-30", "https://url?date=2020-12-29"]
df = pd.DataFrame(urls)

        for url in urls:
            df = pd.read_html(url, parse_dates=True)  
            print(df[0])
            df[0].to_csv('file.csv', encoding='utf-8', mode='a', header=False, index=False)
            print ("Data have been extracted successfully")

On the output, the two values of some cells they appear in a line e.g € 14,720 55.3%. As you see, i have an amount and a percentage on the same line, plus some empty columns appearing with NaN. I want to separate the amount with the percentage by the second space, and transfer the percentage in a new column (Percentage) next to this. Im trying with str.split but i get error message that "Dataframe object has no attribute list". Also other methods i try i get same error that Dataframe object has no attribute..... i convert the urls list to Dataframe with

df = pd.DataFrame(urls)

but i still dont understant if this is the way to convert because it keeps giving me the error.

Also when i try to delete the empty columns with

df.drop(df.columns[[0,1]], axis=1)

AttributeError: 'list' object has no attribute 'drop'

i get the same message.

So, two things. How i separate by second space, values from specific column and put them in a new column next and then it would be cool if i could drop the empty columns. Either by column number or by empty cells.

Thanks

question from:https://stackoverflow.com/questions/65866460/split-values-in-column-by-column-number-with-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Based on the discussion in the comments, this is what you're looking for:

# simple reproducible example
import pandas as pd
df = pd.DataFrame()
df['values'] = ['€ 1,390 75%','€ 45 12.8%','€ 14,390 9%']

# this removes the currency symbol
def remove_symbol(x):
    x = x.split(' ')
    return ' '.join(x[1:])

# this splits the remaining string into two columns on the space seperator
df[['money','percentage']] = df['values'].apply(lambda x: remove_symbol(x)).str.split(' ', 1, expand=True)

Output:

    values      money   percentage
0   € 1,390 75% 1,390   75%
1   € 45 12.8%  45      12.8%
2   € 14,390 9% 14,390  9%


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...