Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
788 views
in Technique[技术] by (71.8m points)

python - Pandas: Best way to join two dataframes based on a common column

I know this is a basic question. But, please hear me out.

I have below dataframes:

In [722]: m1
Out[722]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101          NaN        NaN
2        102         91.0       True
3        103          NaN        NaN
4        104         94.0       True
5        105          NaN        NaN
6        106          NaN        NaN

In [721]: m3
Out[721]: 
   Person_id  Evidence_14 Feature_14
0        100          NaN        NaN
1        101         99.0      False
2        102          NaN        NaN
3        103         95.0      False
4        104          NaN        NaN
5        105          NaN        NaN
6        106         93.0      False

Expected Output:

In [734]: z
Out[734]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101         99.0      False
2        102         91.0       True
3        103         95.0      False
4        104         94.0       True
5        105          NaN        NaN
6        106         93.0      False

I am able to solve this like below:

In [725]: z = m1.merge(m3, on='Person_id')
In [728]: z['Evidence_14'] = z.Evidence_14_x.combine_first(z.Evidence_14_y)
In [731]: z['Feature_14'] = z.Feature_14_x.combine_first(z.Feature_14_y)
In [733]: z.drop(['Evidence_14_x', 'Evidence_14_y', 'Feature_14_x', 'Feature_14_y'], 1, inplace=True)

In [734]: z
Out[734]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101         99.0      False
2        102         91.0       True
3        103         95.0      False
4        104         94.0       True
5        105          NaN        NaN
6        106         93.0      False

But, is there a cleaner/better way to do this? Am I missing something very obvious?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If columns names matching and need match by Person_id values use:

m = m1.set_index('Person_id').combine_first(m2.set_index('Person_id')).reset_index()

If index values are same and also Person_id are same in both DataFrames solution should be simplify by matching with original index values:

m = m1.combine_first(m2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...