Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
269 views
in Technique[技术] by (71.8m points)

java - Given the number of a Unicode code point, how can I obtain a String or CharSequence object for that character

I have seen Questions and Answers about obtaining the code point number of a Unicode character in Java. For example, the Question How can I get a Unicode character's code?.

But I want the opposite: given an integer number, how do I get text of that character assigned to that code point number?

The char primitive data type is of no use, being limited to only the Basic Multilingual Plane of the Unicode character set. That plane represents approximately the first 64,000 characters defined in Unicode. But Unicode has grown to nearly double that, over 113,000 characters defined now. The numbers assigned to characters range over a million. Being based on 16-bits, a char is limited to a range of 64K, not nearly enough.

Both Character and String classes offer the method codePointAt to examine a character and return an int representing the code point assigned in Unicode. I am looking for the opposite.

? Given an int, how to get an object of Character, String, or some implementation of CharSequence that I can then join to other text?

When writing string literals, we can use a Unicode escape sequence with the backslash-with-u. But I am interested in working with integer variables, soft-coding rather than hardcoding the Unicode characters.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

tl;dr

String s = Character.toString( 128_567 ) ;

??

Details

You asked for an object of Character, String, or some implementation of CharSequence.

Character

The Character class is actually legacy, a mere object wrapper around the primitive char type. The char type is legacy too, being defined internally as a 16-bit number limited to the first 64K of Unicode code points. Unicode now has more than twice than number of code points assigned to characters, so char fails to represent most characters.

So we cannot instantiate a Character object for a character outside the Basic Multilingual Plane set of characters. So, as a workaround, Character.toString( int ) produces a String containing a single character. String can handle any and all Unicode characters, while Character cannot.

String ?? Character.toString( int )

To get a String object containing a single character determined by an int, pass the int to Character.toString().

As an example, we use FACE WITH MEDICAL MASK, an emoji character at U+1F637 (decimal: 128,567).

// -----|  input  |----------------
String input = "??" ;                                 // FACE WITH MEDICAL MASK at code point U+1F637 (decimal: 128,567).
int codePoint = input.codePointAt( 0 ) ;              // Returns 128,567. 
System.out.println( "codePoint : " + codePoint ) ;   

codePoint : 128567

Convert that int primitive variable to a String.

// -----|  String  |----------------
String output = Character.toString( codePoint ) ;     // Pass an `int` primitive integer number.
System.out.println( "output : " + output ) ; 

output : ??

Or use a literal integer number.

String output2 = Character.toString( 128_567 ) ;      // Pass an integer literal.
System.out.println( "output2 : " + output2 ) ;

output2 : ??

See this code run live at IdeOne.com.

CharSequence

The code above works, as String is an implementation of CharSequence.

CharSequence cs = Character.toString( 128_567 ) ;     // Returns a `String` which is a `CharSequence`. 

appendCodePoint

The StringBuilder class offers a method appendCodePoint to add a character via its assigned Unicode code point number. Ditto for thread-safe StringBuffer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...