When answering this question (and having read this answer to a similar question), I thought that I knew how Python caches regexes.
But then I thought I'd test it, comparing two scenarios:
- a single compilation of a simple regex, then 10 applications of that compiled regex.
- 10 applications of an uncompiled regex (where I would have expected slightly worse performance because the regex would have to be compiled once, then cached, and then looked up in the cache 9 times).
However, the results were staggering (in Python 3.3):
>>> import timeit
>>> timeit.timeit(setup="import re",
... stmt='r=re.compile(r"w+")
for i in range(10):
r.search(" jkdhf ")')
18.547793477671938
>>> timeit.timeit(setup="import re",
... stmt='for i in range(10):
re.search(r"w+"," jkdhf ")')
106.47892003890324
That's over 5.7 times slower! In Python 2.7, there is still an increase by a factor of 2.5, which is also more than I would have expected.
Has caching of regexes changed between Python 2 and 3? The docs don't seem to suggest that.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…