Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
358 views
in Technique[技术] by (71.8m points)

javascript - Storing binary data in UTF-8 string

I want to use a WebSocket to transfer binary data, but you can only use WebSockets to transfer UTF-8 strings.

Encoding it using base64 is one option, but my understanding is that base64 is most desirable when your text might be converted from one format to another. In this case, I know the data will always be UTF-8, so is there a better way of encoding binary data in a UTF-8 string without paying base64's 33% size premium?

This question is mostly academic, as binary support will probably be added to WebSocket eventually, and base64 is a perfectly cromulent alternative in the meantime.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could use a Base-128 encoding instead of a Base-64 encoding. That will only result in an overhead of 1/7 in opposite to 1/3.

The idea is to use all Unicode code points that can be represented in a single byte in UTF-8 (0–127). That means all bytes begin with a 0 so there are seven bits left for the data:

0?xxxxxxx

That results in an encoding where 7 input bytes are encoded using 8 output bytes:

input:  aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg
output: 0aaaaaaa 0abbbbbb 0bbccccc 0cccdddd 0ddddeee 0eeeeeff 0ffffffg 0ggggggg

So the output to input ratio is 8/7.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...