java - UTF-8 Encoding ; Only some Japanese characters are not getting converted

Question

Welcome To Ask or Share your Answers For Others

java - UTF-8 Encoding ; Only some Japanese characters are not getting converted

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - UTF-8 Encoding ; Only some Japanese characters are not getting converted

I am getting the parameter value as parameter from the Jersey Web Service, which is in Japaneses characters.

Here, 'japaneseString' is the web service parameter containing the characters in japanese language.

   String name = new String(japaneseString.getBytes(), "UTF-8");

However, I am able to convert a few sting literals successfully, while some of them are creating problems.

The following were successfully converted:

 1) アップル
 2) 赤
 3) 世丕且且世两上与丑万丣丕且丗丕
 4) 世世丗丈

While these din't:

 1) ひほわれよう
 2) 存在する

When I further investigated, i found that these 2 strings are getting converted in to some JUNK characters.

 1) Input: ひほわれよう        Output : ????????れよ???
 2) Input: 存在する            Output: 存在???る

Any idea why some of the japanese characters are not converted properly?

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:59:17+0000

You are mixing concepts here.

A String is just a sequence of characters (chars); a String in itself has no encoding at all. For what it's worth, replace characters in the above with carrier pigeons. Same thing. A carrier pigeon has no encoding. Neither does a char. (1)

What you are doing here:

new String(x.getBytes(), "UTF-8")

is a "poor man's encoding/decoding process". You will probably have noticed that there are two versions of .getBytes(): one where you pass a charset as an argument and the other where you don't.

If you don't, and that is what happens here, it means you will get the result of the encoding process using your default character set; and then you try and re-decode this byte sequence using UTF-8.

Don't do that. Just take in the string as it comes. If, however, you have trouble reading the original byte stream into a string, it means you use a Reader with the wrong charset. Fix that part.

For more information, read this link.

(1) the fact that, in fact, a char is a UTF-16 code unit is irrelevant to this discussion

Categories

java - UTF-8 Encoding ; Only some Japanese characters are not getting converted

java - UTF-8 Encoding ; Only some Japanese characters are not getting converted

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags