Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
761 views
in Technique[技术] by (71.8m points)

Python: How to extract multiple strings from pandas dataframe column

I have a dataset which has a specific column containing strings in the format of: Building = Building_A and Floor = Floor_4 Building = Building_D and Floor = Floor_2

I would like to extract only the building and floor names, concatenated into a single string / new column. E.g. Building_A/Floor_4 Building_D/Floor_2

I've spent about an hour looking through previous posts and was not able to find something to match what I am trying to do. Any help would be appreciated.

question from:https://stackoverflow.com/questions/65886047/pandas-regex-search

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Assume we have dataframe df:

import pandas as pd
df = pd.DataFrame({'txt': ["Building = Building_A and Floor = Floor_4",
                           "Building = Building_Z and Floor = Floor_9",
                           "Building = Martello and Floor = Ground"]})

First define pattern to extract:

pat = "(Floor_d+)|(Building_w{1})"

Alternatively if You look for all words after "= ":

pat = r"(?<== )(w+)"

Please note lookbehind (?<=) in pattern definition.

Then apply lambda function to column txt:

df['txt_extract'] = 
df[['txt']].apply(lambda r: "/".join(r.str.extractall(pat).stack()), axis=1)

Result:

0    Building_A/Floor_4
1    Building_Z/Floor_9
2    Martello/Ground

Instead of str.extract use str.extractall which looks for all occurences of pattern. Resulting searches are stacked and joined with "/" separator. Please note that order of patterns found is preserved what may be important in Your case.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...