Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
276 views
in Technique[技术] by (71.8m points)

java - Regex to match a C-style multiline comment

I have a string for e.g.

String src = "How are things today /* this is comment **/ and is your code  /** this is another comment */ working?"

I want to remove /* this is comment **/ and /** this is another comment */ substrings from the src string.

I tried to use regex but failed due to less experience.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The best multiline comment regex is an unrolled version of (?s)/*.*?*/ that looks like

String pat = "/\*[^*]*\*+(?:[^/*][^*]*\*+)*/";

See the regex demo and explanation at regex101.com.

In short,

  • /* - match the comment start /*
  • [^*]**+ - match 0+ characters other than * followed with 1+ literal *
  • (?:[^/*][^*]**+)* - 0+ sequences of:
    • [^/*][^*]**+ - not a / or * (matched with [^/*]) followed with 0+ non-asterisk characters ([^*]*) followed with 1+ asterisks (*+)
  • / - closing /

David's regex needs 26 steps to find the match in my example string, and my regex needs just 12 steps. With huge inputs, David's regex is likely to fail with a stack overflow issue or something similar because the .*? lazy dot matching is inefficient due to lazy pattern expansion at each location the regex engine performs, while my pattern matches linear chunks of text in one go.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...