Extracting approach
You can use a matching approach as it is the most stable and allows arbitrary amount of escaping
chars. You can use
(?s)(?:\.|[^\|])+
See the regex demo. Details:
(?s)
- Pattern.DOTALL
embedded flag option
(?:\.|[^\|])+
- one or more repetitions of
and then any one char, or any char but
and |
.
See the Java demo:
String s = "A|B\|C\\|D\\\|E\\\\|F";
Pattern pattern = Pattern.compile("(?:\\.|[^\\|])+", Pattern.DOTALL);
Matcher matcher = pattern.matcher(s);
List<String> results = new ArrayList<>();
while (matcher.find()){
results.add(matcher.group());
}
System.out.println(results);
// => [A, B|C\, D\|E\\, F]
Splitting approach (workaround for split
)
You may (ab)use the constrained-width lookbehind pattern support in Java regex and use limiting quantifier like {0,1000}
instead of *
quantifier. A work-around would look like
String s = "A|B\|C\\|D\\\|E\\\\|F";
String[] results = s.split("(?<=(?<!\\)(?:\\{2}){0,1000})\|"); System.out.println(Arrays.toString(results));
See this Java demo.
Note (?:\{2}){0,1000}
part will only allow up to 1000 escaping backslashes that should suffice in most cases, I believe, but you might want to test this first. I'd still recommend the first solution.
Details:
(?<=
- start of a positive lookbehind:
(?<!\)
- a location not immediately preceded with a
(?:\{2}){0,1000}
- zero to one thousand occurrences of double backslash
)
- end of the positive lookbehind
|
- a |
char.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…