You are mixing concepts here.
A String
is just a sequence of characters (char
s); a String
in itself has no encoding at all. For what it's worth, replace characters
in the above with carrier pigeons
. Same thing. A carrier pigeon has no encoding. Neither does a char
. (1)
What you are doing here:
new String(x.getBytes(), "UTF-8")
is a "poor man's encoding/decoding process". You will probably have noticed that there are two versions of .getBytes()
: one where you pass a charset as an argument and the other where you don't.
If you don't, and that is what happens here, it means you will get the result of the encoding process using your default character set; and then you try and re-decode this byte sequence using UTF-8.
Don't do that. Just take in the string as it comes. If, however, you have trouble reading the original byte stream into a string, it means you use a Reader
with the wrong charset. Fix that part.
For more information, read this link.
(1) the fact that, in fact, a char
is a UTF-16 code unit is irrelevant to this discussion
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…