Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
787 views
in Technique[技术] by (71.8m points)

string - Text cleaning and replacement: delete from a text in Java

I'm cleaning an incoming text in my Java code. The text includes a lot of " ", but not as in a new line, but literally " ". I was using replaceAll() from the String class, but haven't been able to delete the " ". This doesn't seem to work:

String string;
string = string.replaceAll("\n", "");

Neither does this:

String string;
string = string.replaceAll("
", "");

I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.

Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?

UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Hooknc is right. I'd just like to post a little explanation:

"\n" translates to " " after the compiler is done (since you escape the backslash). So the regex engine sees " " and thinks new line, and would remove those (and not the literal " " you have).

" " translates to a real new line by the compiler. So the new line character is send to the regex engine.

"\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '' and 'n', giving you the desired result.

Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...