Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
662 views
in Technique[技术] by (71.8m points)

python - how do I best validate email in pandas data frame

I have a data frame (df) with emails and numbers like

    email                          euro
0   [email protected]      150
1   [email protected]     50
2   [email protected]      300
3   kjfslkfj                         0
4   [email protected]    200

I need to filter all rows with correct emails and euro equal to or greater than 100 and another list with correct emails and euro lower than 100. I know that I can filter by euro like this

df_gt_100 = df.euro >= 100

and

df_lt_100 = df.euro < 100

But I can't find a way to filter the email addresses. I imported the email_validate package and tried things like this

validate_email(df.email)

which gives me a TypeError: expected string or bytes-like object.

Can anyone pls give me a hint how to approach this issue. It'd be nice if I could do this all in one filter with the AND and OR operators.

Thanks in advance, Manuel

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use apply, chain mask by & for AND and filter by boolean indexing:

from validate_email import validate_email

df1 = df[(df['euro'] > 100) & df['email'].apply(validate_email)]
print (df1)
                         email  euro
0    [email protected]   150
2    [email protected]   300
4  [email protected]   200

Another approach with regex and contains:

df1 = df[(df['euro'] > 100) &df['email'].str.contains(r'[^@]+@[^@]+.[^@]+')]
print (df1)
                         email  euro
0    [email protected]   150
2    [email protected]   300
4  [email protected]   200

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...