I have a dataframe column, teams, where I am trying to split the team name, 'CubsWhite Sox', into two parts, 'Cubs' and 'White Sox'.
import pandas as pd
import re
data = [{'teams':'CubsWhite Sox','area':'Chicago','league': 'MLB'}, {'teams': 'Red Sox','area':'Boston', 'league': 'MLB'}, {'teams': 'Blue Jay','area':'Toronto', 'league': 'MLB'}]
df = pd.DataFrame(data)
df
so far I could only achieve this result.
df["team"] = df.apply(lambda x: re.findall(r"[A-Z][^A-Z]*(?:s[A-Z][^A-Z]*)", x["teams"]), axis=1)
df
teams area league team
0 CubsWhite Sox Chicago MLB [White Sox]
1 Red Sox Boston MLB [Red Sox]
2 Blue Jay Toronto MLB [Blue Jay]
Also after white, red and blue there are two spaces as I have discovered from here.
df["team"] = df.apply(lambda x: re.findall(r"[A-Z0-9][^A-Z]*", x["teams"]), axis=1)
df
teams area league team
0 CubsWhite Sox Chicago MLB [Cubs, White , Sox]
1 Red Sox Boston MLB [Red , Sox]
2 Blue Jay Toronto MLB [Blue , Jay]
which I can easily remove with
df['teams'] = df['teams'].str.replace(r' +', '')
Can you help me to split these team names like this, please using re.findall?
df
teams area league team
0 CubsWhite Sox Chicago MLB [Cubs, White Sox]
1 Red Sox Boston MLB [Red Sox]
2 Blue Jay Toronto MLB [Blue Jay]
thank you!
question from:
https://stackoverflow.com/questions/65835567/how-to-separate-a-string-with-2-uppercases-and-a-space-with-regex-in-pandas-data 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…