Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
687 views
in Technique[技术] by (71.8m points)

unicode - MySQL CHAR() Function and UTF8 Output?

+--------------------------+--------------------------------------------------------+
| Variable_name            | Value                                                  |
+--------------------------+--------------------------------------------------------+
| character_set_client     | utf8                                                   |
| character_set_connection | utf8                                                   |
| character_set_database   | utf8                                                   |
| character_set_filesystem | binary                                                 |
| character_set_results    | utf8                                                   |
| character_set_server     | utf8                                                   |
| character_set_system     | utf8                                                   |
| character_sets_dir       | /usr/local/mysql-5.1.41-osx10.5-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.41    |
+-----------+
1 row in set (0.00 sec)

mysql> select char(0x00FC);
+--------------+
| char(0x00FC) |
+--------------+
| ?            |
+--------------+
1 row in set (0.00 sec)

Expecting actual utf8 character --> " ü " instead of " ? " Tried char(0x00FC using utf8) also, but no go.

Using mysql version 5.1.41

Been allover the Google, cannot find anything on this. The MySQL docs simply say that multibyte output is expected on values greater than 255, after mysql version 5.0.14.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are confusing UTF-8 with Unicode.

0x00FC is the Unicode code point for ü:

mysql> select char(0x00FC using ucs2);
+----------------------+
| char(0x00FC using ucs2) |
+----------------------+
| ü                   | 
+----------------------+

In UTF-8 encoding, 0x00FC is represented by two bytes:

mysql> select char(0xC3BC using utf8);
+-------------------------+
| char(0xC3BC using utf8) |
+-------------------------+
| ü                      | 
+-------------------------+

UTF-8 is merely a way of encoding Unicode characters in binary form. It is meant to be space efficient, which is why ASCII characters only take a single byte, and iso-8859-1 characters such as ü only take two bytes. Some other characters take three or four bytes, but they are much less common.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...