I figured out a solution with some help from Regex lazy vs greedy confusion.
In regex engines like the one used by Javascript (NFA engines I believe), non-greedy only gives you the match that is shortest going left to right - from the first left-hand match that fits to the nearest right-hand match.
Where there are many left-hand matches for one right-hand match, it will always go from the first it reaches (which will actually give the longest match).
Essentially, it goes through the string one character at a time asking "Are there matches from this character? If so, match the shortest and finish. If no, move to next character, repeat". I expected it to be "Are there matches anywhere in this string? If so, match the shortest of all of them".
You can approximate a regex that is non-greedy in both directions by replacing the .
with a negation meaning "not the left-side match". To negate a string like this requires negative lookaheads and non-capturing groups, but it's as simple as dropping the string into (?:(?!).)
. For example, (?:(?!HOHO).)
For example, the equivalent of HOHO.*?_HO_
which is non-greedy on the left and right would be:
HOHO(?:(?!HOHO).)*?_HO_
So the regex engine is essentially going through each character like this:
HOHO
- Does this match the left side?
(?:(?!HOHO).)*
- If so, can I reach the right-hand side without any repeats of the left side?
_HO_
- If so, grab everything until the right-hand match
?
modifier on *
or +
- If there are multiple right-hand matches, choose the nearest one
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…