I have this reference working Perl script with a regex, copied from a Java snippet that isn't giving the expected results:
my $regex = '^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$';
if ("A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A" =~ /$regex/)
{
print "Matches 1=$1 2=$2 3=$3 4=$4
";
}
This correctly outputs:
Matches 1=PROD 2=COMP 3=LOGL 4=00000000-0000-8033-0000-000200354F0A
Now the equivalent Java snippet:
private static final String NON_SYSTEM_TYPE_REGEX = "^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$";
private static final Pattern NON_SYSTEM_TYPE_PATTERN = Pattern.compile(MutableUniqueIdentity.NON_SYSTEM_TYPE_REGEX);
...
final Matcher match = MutableUniqueIdentity.NON_SYSTEM_TYPE_PATTERN.matcher(uniqueIdentity);
The uniqueIdentity
input is further back in the stack trace (in a unit test) and is this value:
final String id5CompactString = "A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A";
NOTE: The regex and uniqueIdentity
values were copied to the Perl program from a debug session to assert if a different language comes up with a different result (which it did).
ADDITIONAL NOTE: The reason the non-capture group is there is to allow the third element in the string to be optional, so it has to deal with both of these:
A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A
A-PROD-COMP-00000000-0000-8033-0000-000200354F0A
My unit test fails in Java - the third match group, which should be LOGL
, is in fact 0000
.
Here is a screenshot of the debugger right after the regex match line above:
You can see that the pattern matches, you can verify that the input parameter (text
) and regex are the same as the Perl script, but the result is different!
So my question is: Why does match.groups(3)
have a value of 0000
(when it should have a value LOGL
) and how does that related back to the regex and the string it is applied to?
In Perl it yields the correct result - LOGL
.
Additional info: I have perused this page that highlights the differences between Perl and Java regex engines, and there doesn't appear to be anything applicable.