can you try this.
The and
will act for short circuit evaluation
and both conditions will be checked in a single iteration.
import pandas as pd
from langdetect import detect #pip install langdetect
def cusom_detect(x):
try:
return detect(x)=='en'
except:
return False
df_out = df[df['user_message'].apply(lambda x: (len(x.split(' ')) >= 5) and cusom_detect(x))]
df_out.to_csv('output.csv')
Using pandarallel
@https://github.com/nalepae/pandarallel
import pandas as pd
from langdetect import detect #pip install langdetect
from pandarallel import pandarallel #pip install pandarallel
pandarallel.initialize()
def cusom_detect(x):
try:
return detect(x)=='en'
except:
return False
df_out = df[df['user_message'].parallel_apply(lambda x: (len(x.split(' ')) >= 5) and cusom_detect(x))]
df_out.to_csv('output.csv')
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…