No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.
Linux
Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It's basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG
environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.
This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.
OS X
The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen
accept UTF-8 strings (since they're 8-bit compatible) and convert them internally.
Windows
The Windows API uses UTF-16 for filenames, but fopen
uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t
on Windows). For example, _wfopen
instead of fopen
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…