Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
208 views
in Technique[技术] by (71.8m points)

pandas - Conversion of years and months to months with string column input python

dataset example:

experience

5 month
nan
1 months
8 month
17 months
8 year
11 years
1.7 year
3.1 years
15.7 months
18 year
2017.2 years
98.3 years
68 year

I have a column with applicant's experience in years. It is very messy and I tried to go through it and create a sample. I have numbers followed by (month or months) and (year of years).

There are many nan entries and it should be ignored.

The goal is to create a column experience in months:

if nan
  copy nan to the corresponding column
if the row has month or months 
  copy the number to the corresponding column
if year or years in the row and the number <55 
  the number shall be multiplied by 12 and copied to the corresponding column
else copy nan to the corresponding column

How to achieve this?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Simple solution using reg expressions, keeping workings for transparency.

import numpy as np
df = pd.read_csv(io.StringIO("""experience

5 month
nan
1 months
8 month
17 months
8 year
11 years
1.7 year
3.1 years
15.7 months
18 year
2017.2 years
98.3 years
68 year"""))

df = df.assign(unit=lambda dfa: dfa["experience"].str.extract("([a-z]+)+"),
         val=lambda dfa: dfa["experience"].str.extract("([0-9,.]+)").astype(float),
         months=lambda dfa: np.where(dfa["unit"].isin(["month","months"]), dfa["val"],
                                    np.where(dfa["unit"].isin(["year","years"])
                                             &dfa["val"].lt(55), dfa["val"]*12, np.nan)))

print(df.to_string(index=False))

output

   experience    unit     val  months
      5 month   month     5.0     5.0
          NaN     NaN     NaN     NaN
     1 months  months     1.0     1.0
      8 month   month     8.0     8.0
    17 months  months    17.0    17.0
       8 year    year     8.0    96.0
     11 years   years    11.0   132.0
     1.7 year    year     1.7    20.4
    3.1 years   years     3.1    37.2
  15.7 months  months    15.7    15.7
      18 year    year    18.0   216.0
 2017.2 years   years  2017.2     NaN
   98.3 years   years    98.3     NaN
      68 year    year    68.0     NaN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...