Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
303 views
in Technique[技术] by (71.8m points)

python - functional difference between lookarounds and non-capture group?

I'm trying to come up with an example where positive look-around works but non-capture groups won't work, to further understand their usages. The examples I"m coming up with all work with non-capture groups as well, so I feel like I"m not fully grasping the usage of positive look around.

Here is a string, (taken from a SO example) that uses positive look ahead in the answer. The user wanted to grab the second column value, only if the value of the first column started with ABC, and the last column had the value 'active'.

string ='''ABC1    1.1.1.1    20151118    active
          ABC2    2.2.2.2    20151118    inactive
          xxx     x.x.x.x    xxxxxxxx    active'''

The solution given used 'positive look ahead' but I noticed that I could use non-caputure groups to arrive at the same answer. So, I'm having trouble coming up with an example where positive look-around works, non-capturing group doesn't work.

pattern =re.compile('ABCws+(S+)s+(?=S+s+active)') #solution

pattern =re.compile('ABCws+(S+)s+(?:S+s+active)') #solution w/out lookaround

If anyone would be kind enough to provide an example, I would be grateful.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The fundamental difference is the fact, that non-capturing groups still consume the part of the string they match, thus moving the cursor forward.

One example where this makes a fundamental difference is when you try to match certain strings, that are surrounded by certain boundaries and these boundaries can overlap. Sample task:

Match all as from a given string, that are surrounded by bs - the given string is bababaca. There should be two matches, at positions 2 and 4.

Using lookarounds this is rather easy, you can use b(a)(?=b) or (?<=b)a(?=b) and match them. But (?:b)a(?:b) won't work - the first match will also consume the b at position 3, that is needed as boundary for the second match. (note: the non-capturing group isn't actually needed here)

Another rather prominent sample are password validations - check that the password contains uppercase, lowercase letters, numbers, whatever - you can use a bunch of alternations to match these - but lookaheads come in way easier:

(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!?.])

vs

(?:.*[a-z].*[A-Z].*[0-9].*[!?.])|(?:.*[A-Z][a-z].*[0-9].*[!?.])|(?:.*[0-9].*[a-z].*[A-Z].*[!?.])|(?:.*[!?.].*[a-z].*[A-Z].*[0-9])|(?:.*[A-Z][a-z].*[!?.].*[0-9])|...

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...