python - Get first non-null value per row

Question

Welcome To Ask or Share your Answers For Others

python - Get first non-null value per row

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Get first non-null value per row

I have a sample dataframe show as below. For each line, I want to check the c1 first, if it is not null, then check c2. By this way, find the first notnull column and store that value to column result.

ID  c1  c2  c3  c4  result
1   a   b           a
2       cc  dd      cc
3           ee  ff  ee
4               gg  gg

I am using this way for now. but I would like to know if there is a better method.(The column name do not have any pattern, this is just sample)

df["result"] = np.where(df["c1"].notnull(), df["c1"], None)
df["result"] = np.where(df["result"].notnull(), df["result"], df["c2"])
df["result"] = np.where(df["result"].notnull(), df["result"], df["c3"])
df["result"] = np.where(df["result"].notnull(), df["result"], df["c4"])
df["result"] = np.where(df["result"].notnull(), df["result"], "unknown)

When there are lots of columns, this method looks not good.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:24:35+0000

Use back filling NaNs first and then select first column by iloc:

df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')

Or:

df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')

print (df)
   ID   c1   c2  c3   c4 result
0   1    a    b   a  NaN      a
1   2  NaN   cc  dd   cc     cc
2   3  NaN   ee  ff   ee     ee
3   4  NaN  NaN  gg   gg     gg

Performance:

df = pd.concat([df] * 1000, ignore_index=True)


In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.78 ms per loop

In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.7 ms per loop

#jpp solution
In [222]: %%timeit
     ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index)
     ...: 
     ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))]
     ...: 
1 loop, best of 3: 180 ms per loop

#c???s????'  s solution
In [223]: %timeit df['result'] = df.stack().groupby(level=0).first()
1 loop, best of 3: 606 ms per loop

Categories

python - Get first non-null value per row

python - Get first non-null value per row

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags