Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
122 views
in Technique[技术] by (71.8m points)

java - How to match a string with a regex only if it's between two delimiters?

My goal is to delete all matches from an input using a regular expression with Java 7:

input.replaceAll([regex], "");

Given this example input with a target string abc-:

<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>

What regex could I use in the code above to match abc- only when it is between the <TAG> and </TAG> delimiters? Here is the desired matching behaviour, with <--> for a match:

               <--><-->     <-->                                       <-->     <--><-->
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>

Expected result:

<TAG>test-test-test-test-</TAG>test-abc-test-abc-<TAG>test-test-</TAG>

The left and right delimiters are always different. I am not particularly looking for a recursive solution (nested delimiters).

I think this might be doable with lookaheads and/or lookbehinds but I didn't get anywhere with them.

question from:https://stackoverflow.com/questions/65907832/how-to-match-a-string-with-a-regex-only-if-its-between-two-delimiters

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use a regex like

(?s)(G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-

See the regex demo. Replace with $1$2. Details:

  • (?s) - a Pattern.DOTALL embedded flag option
  • (G(?!^)|<TAG>(?=.*?</TAG>)) - Group 1 ($1): either of the two:
    • G(?!^) - end of the previous successful match
    • | - or
    • <TAG>(?=.*?</TAG>) - <TAG> that is immediately followed with any zero or more chars, as few as possible, followed with </TAG> (thus, we make sure there is actually the closing, right-hand delimiter further in the string)
  • ((?:(?!<TAG>|</TAG>).)*?) - Group 2 ($2): any one char (.), zero or more repetitions, but as few as possible (*?) that does not start a <TAG> or </TAG> char sequences (aka tempered greedy token)
  • abc- - the pattern to be removed, abc-.

In Java:

String pattern = "(?s)(\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-";
String result = text.replaceAll(pattern, "$1$2");

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...