Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
595 views
in Technique[技术] by (71.8m points)

regex - Python: Split string by list of separators

In Python, I'd like to split a string using a list of separators. The separators could be either commas or semicolons. Whitespace should be removed unless it is in the middle of non-whitespace, non-separator characters, in which case it should be preserved.

Test case 1: ABC,DEF123,GHI_JKL,MN OP
Test case 2: ABC;DEF123;GHI_JKL;MN OP
Test case 3: ABC ; DEF123,GHI_JKL ; MN OP

Sounds like a case for regular expressions, which is fine, but if it's easier or cleaner to do it another way that would be even better.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This should be much faster than regex and you can pass a list of separators as you wanted:

def split(txt, seps):
    default_sep = seps[0]

    # we skip seps[0] because that's the default separator
    for sep in seps[1:]:
        txt = txt.replace(sep, default_sep)
    return [i.strip() for i in txt.split(default_sep)]

How to use it:

>>> split('ABC ; DEF123,GHI_JKL ; MN OP', (',', ';'))
['ABC', 'DEF123', 'GHI_JKL', 'MN OP']

Performance test:

import timeit
import re


TEST = 'ABC ; DEF123,GHI_JKL ; MN OP'
SEPS = (',', ';')


rsplit = re.compile("|".join(SEPS)).split
print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 1.6242462980007986

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 1.3588597209964064

And with a much longer input string:

TEST = 100 * 'ABC ; DEF123,GHI_JKL ; MN OP , '

print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 130.67168392999884

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 50.31940778599528

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...