My primary concern is with the Java flavor, but I'd also appreciate information regarding others.
Let's say you have a subpattern like this:
(.*)(.*)
Not very useful as is, but let's say these two capture groups (say, 1
and 2
) are part of a bigger pattern that matches with backreferences to these groups, etc.
So both are greedy, in that they try to capture as much as possible, only taking less when they have to.
My question is: who's greedier? Does 1
get first priority, giving 2
its share only if it has to?
What about:
(.*)(.*)(.*)
Let's assume that 1
does get first priority. Let's say it got too greedy, and then spit out a character. Who gets it first? Is it always 2
or can it be 3
?
Let's assume it's 2
that gets 1
's rejection. If this still doesn't work, who spits out now? Does 2
spit to 3
, or does 1
spit out another to 2
first?
Bonus question
What happens if you write something like this:
(.*)(.*?)(.*)
Now 2
is reluctant. Does that mean 1
spits out to 3
, and 2
only reluctantly accepts 3
's rejection?
Example
Maybe it was a mistake for me not to give concrete examples to show how I'm using these patterns, but here's some:
System.out.println(
"OhMyGod=MyMyMyOhGodOhGodOhGod"
.replaceAll("^(.*)(.*)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><My><God>"
// same pattern, different input string
System.out.println(
"OhMyGod=OhMyGodOhOhOh"
.replaceAll("^(.*)(.*)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><MyGod><>"
// now 2 is reluctant
System.out.println(
"OhMyGod=OhMyGodOhOhOh"
.replaceAll("^(.*)(.*?)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><><MyGod>"
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…