Excel Encodings
I found the WINDOWS-1252
encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.
Depending on your system and the tools you use, this encoding could also be named CP1252
, ANSI
, Windows (ANSI)
, MS-ANSI
or just Windows
, among other variations.
This encoding is a superset of ISO-8859-1
(aka LATIN1
and others), so you can fallback to ISO-8859-1
if you cannot use WINDOWS-1252
for some reason. Be advised that ISO-8859-1
is missing some characters from WINDOWS-1252
as shown here:
| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name | Unicode Range |
| € | 128 | 8364 | 0x80 | U+20AC | € | euro sign | Currency Symbols |
| ? | 130 | 8218 | 0x82 | U+201A | ‚ | single low-9 quotation mark | General Punctuation |
| ? | 131 | 402 | 0x83 | U+0192 | ƒ | Latin small letter f with hook | Latin Extended-B |
| ? | 132 | 8222 | 0x84 | U+201E | „ | double low-9 quotation mark | General Punctuation |
| … | 133 | 8230 | 0x85 | U+2026 | … | horizontal ellipsis | General Punctuation |
| ? | 134 | 8224 | 0x86 | U+2020 | † | dagger | General Punctuation |
| ? | 135 | 8225 | 0x87 | U+2021 | ‡ | double dagger | General Punctuation |
| ? | 136 | 710 | 0x88 | U+02C6 | ˆ | modifier letter circumflex accent | Spacing Modifier Letters |
| ‰ | 137 | 8240 | 0x89 | U+2030 | ‰ | per mille sign | General Punctuation |
| ? | 138 | 352 | 0x8A | U+0160 | Š | Latin capital letter S with caron | Latin Extended-A |
| ? | 139 | 8249 | 0x8B | U+2039 | ‹ | single left-pointing angle quotation mark | General Punctuation |
| ? | 140 | 338 | 0x8C | U+0152 | Œ | Latin capital ligature OE | Latin Extended-A |
| ? | 142 | 381 | 0x8E | U+017D | | Latin capital letter Z with caron | Latin Extended-A |
| ‘ | 145 | 8216 | 0x91 | U+2018 | ‘ | left single quotation mark | General Punctuation |
| ’ | 146 | 8217 | 0x92 | U+2019 | ’ | right single quotation mark | General Punctuation |
| “ | 147 | 8220 | 0x93 | U+201C | “ | left double quotation mark | General Punctuation |
| ” | 148 | 8221 | 0x94 | U+201D | ” | right double quotation mark | General Punctuation |
| ? | 149 | 8226 | 0x95 | U+2022 | • | bullet | General Punctuation |
| – | 150 | 8211 | 0x96 | U+2013 | – | en dash | General Punctuation |
| — | 151 | 8212 | 0x97 | U+2014 | — | em dash | General Punctuation |
| ? | 152 | 732 | 0x98 | U+02DC | ˜ | small tilde | Spacing Modifier Letters |
| ? | 153 | 8482 | 0x99 | U+2122 | ™ | trade mark sign | Letterlike Symbols |
| ? | 154 | 353 | 0x9A | U+0161 | š | Latin small letter s with caron | Latin Extended-A |
| ? | 155 | 8250 | 0x9B | U+203A | › | single right-pointing angle quotation mark | General Punctuation |
| ? | 156 | 339 | 0x9C | U+0153 | œ | Latin small ligature oe | Latin Extended-A |
| ? | 158 | 382 | 0x9E | U+017E | | Latin small letter z with caron | Latin Extended-A |
| ? | 159 | 376 | 0x9F | U+0178 | Ÿ | Latin capital letter Y with diaeresis | Latin Extended-A |
Note that the euro sign is missing.
This table can be found at Alan Wood.
Conversion
Conversion is done differently in every tool and language. However, suppose you have a file query_result.csv
which you know is UTF-8
encoded. Convert it to WINDOWS-1252
using iconv
:
iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…