Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
355 views
in Technique[技术] by (71.8m points)

python - pandas - merging with missing values

There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaNs with other NaNs:

>>> foo = DataFrame([
    ['a',1,2],
    ['b',4,5],
    ['c',7,8],
    [np.NaN,10,11]
], columns=['id','x','y'])

>>> bar = DataFrame([
    ['a',3],
    ['c',9],
    [np.NaN,12]
], columns=['id','z'])

>>> pd.merge(foo, bar, how='left', on='id')
Out[428]: 
    id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11  12

[4 rows x 4 columns]

This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)

Is there a way to ignore missing values during a merge without first slicing them out?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could exclude values from bar (and indeed foo if you wanted) where id is null during the merge. Not sure it's what you're after, though, as they are sliced out.

(I've assumed from your left join that you're interested in retaining all of foo, but only want to merge the parts of bar that match and are not null.)

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')

Out[11]: 
id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11 NaN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...