It all depends on what you understan to be a "word". Perhaps you'd better define what you understand to be a word delimiter: for example, blanks, commas .... And write something as
phrase=phrase.replaceAll("([ \s,.;])" + Pattern.quote(word)+ "([ \s,.;])","$1$2");
But you'll have to check additionally for occurrences at the start and the end of the string
For example:
String phrase="bob has a bike bob, bob and boba bob's bike is red and "bob" stuff.";
String word="bob";
phrase=phrase.replaceAll("([\s,.;])" + Pattern.quote(word) + "([\s,.;])","$1$2");
System.out.println(phrase);
prints this
bob has a bike , and boba bob's bike is red and "bob" stuff.
Update: If you insist in using
, considering that the "word boundary" understand Unicode, you can also do this dirty trick: replace all ocurrences of '
by some Unicode letter that you're are sure will not appear in your text, and afterwards do the reverse replacemente. Example:
String phrase="bob has a bike bob, bob and boba bob's bike is red and "bob" stuff.";
String word="bob";
phrase= phrase.replace("'","?").replace('"','?');
phrase=phrase.replaceAll("\b" + Pattern.quote(word) + "\b","");
phrase= phrase.replace('?','"').replace("?","'");
System.out.println(phrase);
UPDATE: To summarize some comments below: one would expect w
and
to have the same notion as to which is a "word character", as almost every regular-expression dialect do. Well, Java does not: w
considers ASCII,
considers Unicode. It's an ugly inconsistence, I agree.
Update 2: Since Java 7 (as pointed out in comments) the UNICODE_CHARACTER_CLASS flag allows to specify a consistent Unicode-only behaviour, see eg here.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…