By referring Joel's Article
Some people are under the
misconception that Unicode is simply a
16-bit code where each character takes
16 bits and therefore there are 65,536
possible characters. This is not,
actually, correct.
After reading the whole article, my point is that, if someone told you, his text is in unicode, you will have no idea how much memory space taken up by every of his character. He have to tell you, "My unicode text is encoded in UTF-8", then only you will have idea how much memory space is taken up by every of his character.
Unicode = not necessary 2 byte for each character
However, when comes to Code Project's Article and Microsoft's Help, this confused me :
Microsoft :
Unicode is a 16-bit character
encoding, providing enough encodings
for all languages. All ASCII
characters are included in Unicode as
"widened" characters.
Code Project :
The Unicode character set is a "wide
character" (2 bytes per character) set
that contains every character
available in every language, including
all technical symbols and special
publishing characters. Multibyte
character set (MBCS) uses either 1 or
2 bytes per character
Unicode = 2 byte for each character ?
Is 65536 possible characters able to represent all language in this world?
Why the concept seems different among web developer community and desktop developer community?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…