python - NumPy equivalent of merge

Question

Welcome To Ask or Share your Answers For Others

python - NumPy equivalent of merge

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - NumPy equivalent of merge

I'm transitioning some stuff from R to Python and am curious about merging efficiently. I've found some stuff on concatenate in NumPy (using NumPy for operations, so I'd like to stick with it), but it doesn't work as expected.

Take two datasets

d1 = np.array([['1a2', '0'], ['2dd', '0'], ['z83', '1'], ['fz3', '0']])

ID      Label
1a2     0
2dd     0
z83     1
fz3     0

and

d2 = np.array([['1a2', '33.3', '22.2'], 
               ['43m', '66.6', '66.6'], 
               ['z83', '12.2', '22.1']])

ID     val1   val2
1a2    33.3   22.2
43m    66.6   66.6
z83    12.2   22.1

I want to merge these together so that the result is

d3

ID    Label    val1    val2
1a2   0        33.3    22.2
z83   1        12.2    22.1

So it's identified rows that match on the ID column and then concatenated these together. This is relatively simple in R using merge, but in NumPy it's less obvious to me.

Is there a way to do this natively in NumPy that I am missing?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:24:41+0000

Here's one NumPy based solution using masking -

def numpy_merge_bycol0(d1, d2):
    # Mask of matches in d1 against d2
    d1mask = np.isin(d1[:,0], d2[:,0])

    # Mask of matches in d2 against d1
    d2mask = np.isin(d2[:,0], d1[:,0])

    # Mask respective arrays and concatenate for final o/p
    return np.c_[d1[d1mask], d2[d2mask,1:]]

Sample run -

In [43]: d1
Out[43]: 
array([['1a2', '0'],
       ['2dd', '0'],
       ['z83', '1'],
       ['fz3', '0']], dtype='|S3')

In [44]: d2
Out[44]: 
array([['1a2', '33.3', '22.2'],
       ['43m', '66.6', '66.6'],
       ['z83', '12.2', '22.1']], dtype='|S4')

In [45]: numpy_merge_bycol0(d1, d2)
Out[45]: 
array([['1a2', '0', '33.3', '22.2'],
       ['z83', '1', '12.2', '22.1']], dtype='|S4')

We could also use broadcasting to get the indices and then integer-indexing in place of masking, like so -

idx = np.argwhere(d1[:,0,None] == d2[:,0])
out = np.c_[d1[idx[:,0]], d2[idx[:,0,1:]

Categories

python - NumPy equivalent of merge

python - NumPy equivalent of merge

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags