Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
373 views
in Technique[技术] by (71.8m points)

python - Speed up a filtering process over million of rows with pandas


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

can you try this. The and will act for short circuit evaluation and both conditions will be checked in a single iteration.

import pandas as pd
from langdetect import detect #pip install langdetect

def cusom_detect(x):
    try:
        return detect(x)=='en'
    except:
        return False

df_out = df[df['user_message'].apply(lambda x: (len(x.split(' ')) >= 5) and cusom_detect(x))]
df_out.to_csv('output.csv')

Using pandarallel @https://github.com/nalepae/pandarallel

import pandas as pd
from langdetect import detect #pip install langdetect
from pandarallel import pandarallel #pip install pandarallel

pandarallel.initialize()

def cusom_detect(x):
    try:
        return detect(x)=='en'
    except:
        return False

df_out = df[df['user_message'].parallel_apply(lambda x: (len(x.split(' ')) >= 5) and cusom_detect(x))]
df_out.to_csv('output.csv')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...