Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
54 views
in Technique[技术] by (71.8m points)

javascript - What is this type of string called?

In python, we can do something like print("some random string".encode().decode('utf-16')) which will output: 潳敭爠湡潤?瑳楲杮.

I feel like that is utf-16, but I'm not really sure, because I can't reproduce it in any other language. My goal is to create a function that will do exactly this, but in Javascript. The problem is that I can't find what of what type if this type of string...

Does someone know how this is called or/and how I could reproduce this in JS ?

question from:https://stackoverflow.com/questions/65852658/what-is-this-type-of-string-called

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A string is a sequence of runes. Unicode is a standard for assigning numeric values to those runes. UTF-8 or UTF-16 are standards for encoding a sequence of runes, as represented by their unicode numeric values, as a sequence of bytes.

What you did there is use encode with the default encoding, which is UTF-8, to get a sequence of bytes which you then tried to decode back to runes as if the bytes had come from a UTF-16 encoding. Basically (because your input string fits in a 1-byte encoding for UTF-8) you're taking pairs of characters from the input, jamming their bytes together and hoping that the resulting value is a legal UTF-16 encoding of something (which in general you cannot count on being true). You'll also run into issues if the utf-8 encoding is not an even number of bytes, of course.

If you really need to do this thing in javascript, you could do something like this:

const str = "some random string";
var buf = new ArrayBuffer(str.length);
// Reinterpret the sequence of bytes as a sequence of byte pairs.
var bufView = new Uint16Array(buf);
for (var i=0, strLen=str.length; i < strLen-1; i+=2) {
  var c1 = str.charCodeAt(i);
  var c2 = str.charCodeAt(i+1);
  if (c1 > 127 || c2 > 127) {
    // This will be a problem.  How you handle it is up to you.
  }
  bufView[i/2] = c1 << 8 | c2;
}
console.log(String.fromCharCode.apply(String, bufView));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...