Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
530 views
in Technique[技术] by (71.8m points)

python 3.x - rsplit on pandas series with regular expression not working

rsplit on pandas series using regular expression not working. I want to split the series based on separator without removing separator.

df2= pd.Series(['Series of Class A','Series of Class B part of Class C','Class D','Class'])
seperator='Class'
data = df2.str.split(r'.(?='+seperator+')', n = 2, expand=True)

result is:

 0                1        2
0  Series of          Class A     None
1  Series of  Class B part of  Class C
2    Class D             None     None
3      Class             None     None

I want to do same thing using rsplit

I tried

data = df2.str.rsplit(r'.(?='+seperator+')', n = 2, expand=True)

Expecting same result using rsplit

 0                1        2
0  Series of          Class A     None
1  Series of  Class B part of  Class C
2    Class D             None     None
3      Class             None     None
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Unfortunately, pd.Series.str.rsplit does not work as documented (v0.25, stable/v1+). The project's GitHub issue tracker has an open bug from Nov. 2019 that repots that rsplit is not working with regex patterns (v 0.24.2 and 0.25.2). Internally, the method is calling str.rsplit which does not support regular expressions.

Luckily, the reporter jamespreed added a (homegrown) alternative function:

def str_rsplit(arr, pat=None, n=None):

    if pat is None or len(pat) == 1:
        if n is None or n == 0:
            n = -1
        f = lambda x: x.rsplit(pat, n)
    else:
        if n is None or n == -1:
            n = 0
        regex = re.compile(pat)
        def f(x):
            s = regex.split(x)
            a, b = s[:-n], s[-n:]
            if not a:
                return b
            ix = 0
            for a_ in a:
                ix = x.find(a_, ix) + len(a_)
            x_ = [x[:ix]]
            return x_ + b
    return f
    res = _na_map(f, arr)
    return res

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...