Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
572 views
in Technique[技术] by (71.8m points)

.net - C# big-endian UCS-2

The project I'm currently working on needs to interface with a client system that we don't make, so we have no control over how data is sent either way. The problem is that were working in C#, which doesn't seem to have any support for UCS-2 and very little support for big-endian. (as far as i can tell)

What I would like to know, is if there's anything i looked over in .net, or something that someone else has made and released that we can use. If not I will take a crack at encoding/decoding it in a custom method, if that's even possible.

But thanks for your time either way.

EDIT: BigEndianUnicode does work to correctly decode the string, the problem was in receiving other data as big endian, so far using IPAddress.HostToNetworkOrder() as suggested elsewhere has allowed me to decode half of the string (Merli? is what comes up and it should be Merlin33069)

Im combing the short code to see if theres another length variable i missed

RESOLUTION: after working out that the bigendian variables was the main problem, i went back through and reviewed the details and it seems that the length of the strings was sent in character counts, not byte counts (in utf it would seem a char is two bytes) all i needed to do was double it, and it worked out. thank you all for your help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
string x = "abc";
byte[] data = Encoding.BigEndianUnicode.GetBytes(x);

In other direction:

string decodedX = Encoding.BigEndianUnicode.GetString(data);

It is not exactly UCS-2 but it is enough for most cases.

UPD: Unicode FAQ

Q: What is the difference between UCS-2 and UTF-16?

A: UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not define a distinct data format, because UTF-16 and UCS-2 are identical for purposes of data exchange. Both are 16-bit, and have exactly the same code unit representation.

Sometimes in the past an implementation has been labeled "UCS-2" to indicate that it does not support supplementary characters and doesn't interpret pairs of surrogate code points as characters. Such an implementation would not handle processing of character properties, code point boundaries, collation, etc. for supplementary characters.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...