Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
140 views
in Technique[技术] by (71.8m points)

python - How to calculate total time difference for rows with same name using a pandas dataframe?

I have an excel file with Patient names and Dates. For example:

Column A=[1000, 1000, 1001, 1001, 1001, 1002, 1002, 1002, 1002]
Column B=['03/12/2009', '03/12/2009', '04/01/2011', '05/01/2010', '08/02/2011', '10/03/2012', '05/12/2010', '07/02/2011', '09/03/2012', '14/02/2013']

For each patient name, I want to calculate the total length of time between the first and final date. I thought to perform this via a pandas df, but I can't complete my code to match the total time for each patient. This is what I have tried so far:

Patient_Dates = pd.read_excel(r'C:UsersPycharmProjectsData.xlsx',
               sheet_name='Sheet 2', header=0, na_values=['NA'], usecols = "B:C")
Patient_Dates_new=Patient_Dates.iloc[3:len(Patient_Dates)]
Patient_Dates_new.columns = ['Patient_ID','Dates']

How can I calculate the total time in days for each patient? Here below is part of my excel fileenter image description here.

question from:https://stackoverflow.com/questions/65871362/how-to-calculate-total-time-difference-for-rows-with-same-name-using-a-pandas-da

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Say, the DataFrame is like

Patient_ID  Dates
0   1000    2009-03-12
1   1000    2009-03-12
2   1001    2011-04-01
3   1001    2010-05-01
4   1001    2011-08-02
5   1002    2012-10-03
6   1002    2010-05-12
7   1002    2011-07-02
8   1002    2012-09-03
df['Dates'] = pd.to_datetime(df['Dates'])
df['days'] = df.groupby(['Patient_ID'])['Dates'].transform(lambda x: x.max()-x.min())

output

Patient_ID  Dates   days
0   1000    2009-03-12  0 days
1   1000    2009-03-12  0 days
2   1001    2011-04-01  458 days
3   1001    2010-05-01  458 days
4   1001    2011-08-02  458 days
5   1002    2012-10-03  875 days
6   1002    2010-05-12  875 days
7   1002    2011-07-02  875 days
8   1002    2012-09-03  875 days

Alternatevily you can use

df['Dates'] = pd.to_datetime(df['Dates'])
grouped = df.groupby(['Patient_ID'])
op = []
for name, group in grouped:
    elapsed_day = group['Dates'].max() - group['Dates'].min()
    op.append([name, elapsed_day])

df_new = pd.DataFrame(op, columns=['Patient_ID', 'elapsed_day'])

df_new

output

Patient_ID  elapsed_day
0   1000    0 days
1   1001    458 days
2   1002    875 days

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...