This is not what it seems
string str = "u1F34E";
.Net uses using UTF-16 to encode its strings. This means two bytes (16-bit) are used to represent one Unicode Code Point. Which in turn makes the Unicode u
escape sequence actually U+0000
to U+FFFF
(16-bit) or for the extended version U+00000000
to U+FFFFFFFF
(32-bit)
The emoji ??, uses a high code point 0001F34E
so will need to encode it as a surrogate pair, two UTF-16 characters "uD83CuDF4E"
or combined as
"U0001F34E"
1
Example
string str = "uD83CuDF4E";
// or
string str = "U0001F34E"
If you goal is to separate actual text elements apposed to characters, you could make use of StringInfo.GetTextElementEnumerator
public static IEnumerable<string> ToElements(string source)
{
var enumerator = StringInfo.GetTextElementEnumerator(source);
while (enumerator.MoveNext())
yield return enumerator.GetTextElement();
}
Note : My use of terminology might not be most common or accurate, if you think it can be tightened up feel free to edit
1 Thanks to Mark Tolonen for pointing out that the Unicode escape sequence actually supports both 16bit and 32bit variants uXXXX
and UXXXXXXXX
more information can be found in a blog post by Jon Skeet Strings in C# and .NET
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…