Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
525 views
in Technique[技术] by (71.8m points)

python - Pandas Dataframe: Replace charactere conditionally

I have a dataframe with a column named "Size". This column have some values containing the size of an android applications list.

Size
8.7M
68M
2M

I need to replace these values to:

Size:
8700000
68000000
...

I thought about a function that verifies if there is a dot at the string '.'. If it exists, change the M value to five zero's (00000). If not, change the M value to six zero's (000000). Could you guys help me with it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

General solution for replace by multiple units:

#dict for replace
_prefix = {'k': 1e3,    # kilo
           'M': 1e6,    # mega
           'B': 1e9,    # giga
}
#all keys of dict separated by | (or)
k = '|'.join(_prefix.keys())
#extract values to new df
df1 = df['Size'].str.extract('(?P<a>[0-9.]*)(?P<b>' + k +')*', expand=True)
#convert numeric column to float
df1.a = df1.a.astype(float)
#map values by dictionary, replace NaN (no prefix) to 1
df1.b = df1.b.map(_prefix).fillna(1)
#multiple columns together
df['Size'] = df1.a.mul(df1.b).astype(int)
print (df)
       Size
0   8700000
1  68000000
2   2000000

If want only replace M solution should be simplified:

df['Size'] = df['Size'].str.replace('M', '').astype(float).mul(1e6).astype(int)
print (df)
       Size
0   8700000
1  68000000
2   2000000

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...