Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
357 views
in Technique[技术] by (71.8m points)

python - Why does this take so long to match? Is it a bug?

I need to match certain URLs in web application, i.e. /123,456,789, and wrote this regex to match the pattern:

r'(d+(,)?)+/$'

I noticed that it does not seem to evaluate, even after several minutes when testing the pattern:

re.findall(r'(d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523')

The expected result would be that there were no matches.

This expression, however, executes almost immediately (note the trailing slash):

re.findall(r'(d+(,)?)+/$', '12345121,223456,123123,3234,4523,523523/')

Is this a bug?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There is some catastrophic backtracking going on that will cause an exponential amount of processing depending on how long the non-match string is. This has to do with your nested repetitions and optional comma (even though some regex engines can determine that this wouldn't be a match with attempting all of the extraneous repetition). This is solved by optimizing the expression.


The easiest way to accomplish this is to just look for 1+ digits or commas followed by a slash and the end of the string: [d,]+/$. However, that is not perfect since it would allow for something like ,123,,4,5/.

For this you can use a slightly optimized version of your initial try: (?:d,?)+/$. First, I made your repeating group non-capturing ((?:...)) which isn't necessary but it provides for a "cleaner match". Next, and the only crucial step, I stopped repeating the d inside of the group since the group is already repeating. Finally, I removed the unnecessary group around the optional , since ? only affects the last character. Pretty much this will look for one digit, maybe a comma, then repeat, and finally followed by a trailing /.


This can still match an odd string 1,2,3,/, so for the heck of it I improved your original regex with a negative lookbehind: (?:d,?)+(?<!,)/$. This will assert that there is no comma directly before the trailing /.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...