Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
418 views
in Technique[技术] by (71.8m points)

python - Fill in same amount of characters where other column is NaN

I have the following dummy dataframe:

df = pd.DataFrame({'Col1':['a,b,c,d', 'e,f,g,h', 'i,j,k,l,m'],
                   'Col2':['aa~bb~cc~dd', np.NaN, 'ii~jj~kk~ll~mm']})

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             NaN
2  i,j,k,l,m  ii~jj~kk~ll~mm

The real dataset has shape 500000, 90.

I need to unnest these values to rows and I'm using the new explode method for this, which works fine.

The problem is the NaN, these will cause unequal lengths after the explode, so I need to fill in the same amount of delimiters as the filled values. In this case ~~~ since row 1 has three comma's.


expected output

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             ~~~
2  i,j,k,l,m  ii~jj~kk~ll~mm

Attempt 1:

df['Col2'].fillna(df['Col1'].str.count(',')*'~')

Attempt 2:

np.where(df['Col2'].isna(), df['Col1'].str.count(',')*'~', df['Col2'])

This works, but I feel like there's an easier method for this:

characters = df['Col1'].str.replace('w', '').str.replace(',', '~')
df['Col2'] = df['Col2'].fillna(characters)

print(df)

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             ~~~
2  i,j,k,l,m  ii~jj~kk~ll~mm

d1 = df.assign(Col1=df['Col1'].str.split(',')).explode('Col1')[['Col1']]
d2 = df.assign(Col2=df['Col2'].str.split('~')).explode('Col2')[['Col2']]

final = pd.concat([d1,d2], axis=1)
print(final)

  Col1 Col2
0    a   aa
0    b   bb
0    c   cc
0    d   dd
1    e     
1    f     
1    g     
1    h     
2    i   ii
2    j   jj
2    k   kk
2    l   ll
2    m   mm

Question: is there an easier and more generalized method for this? Or is my method fine as is.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

One way is using str.repeat and fillna() not sure how efficient this is though:

df.Col2.fillna(pd.Series(['~']*len(df)).str.repeat(df.Col1.str.count(',')))

0       aa~bb~cc~dd
1               ~~~
2    ii~jj~kk~ll~mm
Name: Col2, dtype: object

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...