It is the same value in both versions. You're just printing it on a locale that doesn't support some of the characters, and it's using the Unicode replacement character to display it (the ef bf bd
sequences in your output are where a character it didn't recognize became the replacement character; whatever you used to convert to bytes seamlessly replaced the Unicode replacement character with its UTF-8 encoding).
When the locale is correct and you have terminal/font support that handles the result, it works identically on Python 2 and Python 3. The only real difference is that Python 3 has somewhat saner behaviors under some locales (e.g. Windows console using UTF-8 automatically in 3.6, legacy C locale coercion in 3.7), but you got the same string, it's just outputting and displaying it that produces the wrong result while trying to avoid unencodable characters.
To be clear, Python 2 str
is not limited to ASCII. In terms of what it can hold, it's equivalent to Python 3 bytes
; both can hold arbitrary values in the range [0, 256). The literals differ (Py2 allows non-ASCII characters in a literal without escapes, though without a file encoding declaration, it's not portable), but Py2 str
can hold 'xff'
just like Py3 bytes
's b'xff'
.
Note that your code often won't work identically when the str
contains characters outside the ASCII range that aren't inserted using escapes (it's dependent on the encoding declaration for the file what non-ASCII literal characters in a string literal mean for Python 2), and definitely won't work the same for stuff that's not in latin-1 (because it will have ordinals larger than 256 in Py3, and who knows what in Py2) unless the inputs are of unicode
type in Python 2 (e.g. for literals, prefixed with u
).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…