Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
254 views
in Technique[技术] by (71.8m points)

python - Delete DF rows based on mixed condition (Pandas)

I want to delete DF rows based on a mixed condition. My DF contains these columns:

DF['ID', 'SEQNO','DESCRIP']. 

DF will have multiples rows for a given 'ID' but 'ID'+'SEQNO' combination is unique. Now suppose my DF is having this data:

ID  SEQNO   DESCRIP
A1  1   This is test
A1  2   To check DF
A1  3   XYZ
A1  4   ghj
B1  1   Hello, How are you.
B1  2   XYZ
B1  3   I am Fine
B1  4   Thank You.
B1  5   and you.

Expected Output:

ID  SEQNO   DESCRIP
A1  1   This is test
A1  2   To check DF
B1  1   Hello, How are you.

Here I want all the rows of an ID where SEQNO < the SEQNO where 'XYZ' has came for a given ID. I tried using the code below, but its very poor, and takes a long time:

rdf =  df[df['DESCRIP'].str.contains('XYZ', na=False, case=False )]

for i in range(rdf.shape[0]):
    df = df[ ~((df['ID'] == rdf['ID'].iloc[i]) & (df['SEQNO'] >= rdf['SEQNO'].iloc[i]) ) ]

I would appreciate any suggestions on improving the function. Thank you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try:

import numpy as np

df['mask'] = np.where(df['DESCRIP'].eq('XYZ'), 1, np.nan)

df['mask'] = df.groupby('ID')['mask'].ffill()

df = df.loc[df['mask'].ne(1), ['ID', 'SEQNO', 'DESCRIP']]

In short - we mark rows with XYZ, then forward fill within groups, then filter out marked rows.

Output:

   ID SEQNO              DESCRIP
0  A1     1         This is test
1  A1     2          To check DF
4  B1     1  Hello, How are you.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...