Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
283 views
in Technique[技术] by (71.8m points)

python - Datatime out of a date, hour and minute column where NaNs are present (pandas). Is there a general solution to manage such data?

I am having some trouble managing and combining columns in order to get one datetime column out of three columns containing the date, the hours and the minutes.

Assume the following df (copy and type df= = pd.read_clipboard() to reproduce) with the types as noted below:

>>>df
         date  hour  minute
0  2021-01-01   7.0    15.0
1  2021-01-02   3.0    30.0
2  2021-01-02   NaN     NaN
3  2021-01-03   9.0     0.0
4  2021-01-04   4.0    45.0

>>>df.dtypes
date       object
hour      float64
minute    float64
dtype: object

I want to replace the three columns with one called 'datetime' and I have tried a few things but I face the following problems:

  1. I first create a 'time' column df['time']= (pd.to_datetime(df['hour'], unit='h') + pd.to_timedelta(df['minute'], unit='m')).dt.time and then I try to concatenate it with the 'date' df['datetime']= df['date'] + ' ' + df['time'] (with the purpose of converting the 'datetime' column pd.to_datetime(df['datetime']). However, I get

    TypeError: can only concatenate str (not "datetime.time") to str

  2. If I convert 'hour' and 'minute' to str to concatenate the three columns to 'datetime', then I face the problem with the NaN values, which prevents me from converting the 'datetime' to the corresponding type.

  3. I have also tried to first convert the 'date' column df['date']= df['date'].astype('datetime64[ns]') and again create the 'time' column df['time']= (pd.to_datetime(df['hour'], unit='h') + pd.to_timedelta(df['minute'], unit='m')).dt.time to combine the two: df['datetime']= pd.datetime.combine(df['date'],df['time']) and it returns

    TypeError: combine() argument 1 must be datetime.date, not Series along with the warning

    FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime module instead.

Is there a generic solution to combine the three columns and ignore the NaN values (assume it could return 00:00:00).

What if I have a row with all NaN values? Would it possible to ignore all NaNs and 'datetime' be NaN for this row?

Thank you in advance, ^_^

question from:https://stackoverflow.com/questions/65843239/datatime-out-of-a-date-hour-and-minute-column-where-nans-are-present-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First convert date to datetimes and then add hour and minutes timedeltas with replace missing values to 0 timedelta:

td = pd.Timedelta(0)
df['datetime'] = (pd.to_datetime(df['date']) + 
                  pd.to_timedelta(df['hour'], unit='h').fillna(td) + 
                  pd.to_timedelta(df['minute'], unit='m').fillna(td))

print (df)
         date  hour  minute            datetime
0  2021-01-01   7.0    15.0 2021-01-01 07:15:00
1  2021-01-02   3.0    30.0 2021-01-02 03:30:00
2  2021-01-02   NaN     NaN 2021-01-02 00:00:00
3  2021-01-03   9.0     0.0 2021-01-03 09:00:00
4  2021-01-04   4.0    45.0 2021-01-04 04:45:00

Or you can use Series.add with fill_value=0:

df['datetime'] = (pd.to_datetime(df['date'])
                    .add(pd.to_timedelta(df['hour'], unit='h'), fill_value=0) 
                    .add(pd.to_timedelta(df['minute'], unit='m'), fill_value=0))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...