python - Partial String match on both side of the columns in pandas

Question

Welcome To Ask or Share your Answers For Others

python - Partial String match on both side of the columns in pandas

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Partial String match on both side of the columns in pandas

[Code]

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
}
df1 = pd.DataFrame(d)
print(df1)

df = pd.DataFrame()
for idx, row in df1.iterrows():
    d = df1[df1['email'].str.startswith(row['username'])]
    if not d.empty:
        df = pd.concat([df, d])
df

Using the above code I can filter all the partially matching rows on RIGHT side column (i.e email => username)..

Current Output:

But I want the reversed matching as well (i.e username => email), as below

Expected Output:

Thanks in advance,

question from:https://stackoverflow.com/questions/65642812/partial-string-match-on-both-side-of-the-columns-in-pandas

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:46:36+0000

Something like this works. The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.

Hopefully, this gets you started in the right direction.


import pandas as pd

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
}
df1 = pd.DataFrame(d)


df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)

print(df1)

  ID   username             email  email_match  user_match
0  1    haabi.g     [email protected]        False       False
1  4    pugal.g  [email protected]         True        True
2  5   janani.g  [email protected]        False        True
3  9  hajacob.h     [email protected]        False       False

You can add a counting mechanism, to know how many of the consecutive values match.


def user_match(x):
    name = list(x['email'].split('@')[0])
    user = list(x['username'])
    count = 0
    for t in list(zip(name, user)):
        if t[0] == t[1]:
            count += 1
        if t[0] != t[1]:
            break
    if count >= 3:
        return count
    if count == 0:
        return 0

df1['count'] = df1.apply(lambda x: user_match(x), axis=1)

  ID   username             email  email_match  user_match  count
0  1    haabi.g     [email protected]        False       False      0
1  4    pugal.g  [email protected]         True        True      7
2  5   janani.g  [email protected]        False        True      3
3  9  hajacob.h     [email protected]        False       False      0

Categories

python - Partial String match on both side of the columns in pandas

python - Partial String match on both side of the columns in pandas

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags