Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
101 views
in Technique[技术] by (71.8m points)

python - Partial String match on both side of the columns in pandas

[Code]

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
}
df1 = pd.DataFrame(d)
print(df1)

enter image description here

df = pd.DataFrame()
for idx, row in df1.iterrows():
    d = df1[df1['email'].str.startswith(row['username'])]
    if not d.empty:
        df = pd.concat([df, d])
df

Using the above code I can filter all the partially matching rows on RIGHT side column (i.e email => username)..

Current Output:

enter image description here

But I want the reversed matching as well (i.e username => email), as below

Expected Output:

enter image description here

Thanks in advance,

question from:https://stackoverflow.com/questions/65642812/partial-string-match-on-both-side-of-the-columns-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Something like this works. The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.

Hopefully, this gets you started in the right direction.


import pandas as pd

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
}
df1 = pd.DataFrame(d)


df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)

print(df1)


  ID   username             email  email_match  user_match
0  1    haabi.g     [email protected]        False       False
1  4    pugal.g  [email protected]         True        True
2  5   janani.g  [email protected]        False        True
3  9  hajacob.h     [email protected]        False       False

You can add a counting mechanism, to know how many of the consecutive values match.


def user_match(x):
    name = list(x['email'].split('@')[0])
    user = list(x['username'])
    count = 0
    for t in list(zip(name, user)):
        if t[0] == t[1]:
            count += 1
        if t[0] != t[1]:
            break
    if count >= 3:
        return count
    if count == 0:
        return 0

df1['count'] = df1.apply(lambda x: user_match(x), axis=1)


  ID   username             email  email_match  user_match  count
0  1    haabi.g     [email protected]        False       False      0
1  4    pugal.g  [email protected]         True        True      7
2  5   janani.g  [email protected]        False        True      3
3  9  hajacob.h     [email protected]        False       False      0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...