unicode - What is the difference between EM Dash #151; and #8212;?

Question

Welcome To Ask or Share your Answers For Others

unicode - What is the difference between EM Dash #151; and #8212;?

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:58:55+0000

 is wrong. When you use numeric character references, the number refers to the Unicode codepoint. For numbers below 256 that is the same as the codepoint in ISO-8859-1. In 8859-1, character 151 is amongst the “C1 control codes”, and not a dash or any other visible character.

The confusion arises because character 151 is a dash in Windows code page 1252 (Western European). Many people think cp1252 is the same thing as ISO-8859-1, but in reality it's not: the characters in the C1 range (128 to 159) are different.

The first application is reading your “ASCII” file* as ISO-8859-1, but actually it's probably cp1252 and you'll need a way to clue the app in about what encoding it has to expect.

(*: “ASCII” is a misnomer if there are top-bit-set characters in the file. You probably mean “ANSI”, which is really also a misnomer, but one which has stuck in the Windows world to mean “text encoded in the current system-default code page”.)

Categories

unicode - What is the difference between EM Dash #151; and #8212;?

unicode - What is the difference between EM Dash #151; and #8212;?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags