c++ - Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

Question

Welcome To Ask or Share your Answers For Others

c++ - Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:

wstring a=L"grü?en";
wstring b=L"???? ????!";
wstring c=L"中文";

Because these are example programs.

This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But, straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.

Is there any option that I can specify as command line switch for "cl" in order to make this work? There are there any command line switch like gcc'c -finput-charset?

If not how would you suggest make the text natural for user?

Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.

Note2: I need it to work in MSVC Version >= 9 == VS 2008

The real answer: There is no solution

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:06:55+0000

For those who subscribe to the motto "better late than never", Visual Studio 2015 (version 19 of the compiler) now supports this.

The new /source-charset command line switch allows you to specify the character set encoding used to interpret source files. It takes a single parameter, which can be either the IANA or ISO character set name:

/source-charset:utf-8

or the decimal identifier of a particular code page (preceded by a dot):

/source-charset:.65001

The official documentation is here, and there is also a detailed article describing these new options on the Visual C++ Team Blog.

There is also a complementary /execution-charset switch that works in exactly the same way but controls how narrow character- and string-literals are generated in the executable. Finally, there is a shortcut switch, /utf-8, that sets both /source-charset:utf-8 and /execution-charset:utf-8.

These command-line options are incompatible with the old #pragma setlocale and #pragma execution-character-set directives, and they apply globally to all source files.

For users stuck on older versions of the compiler, the best option is still to save your source files as UTF-8 with a BOM (as other answers have suggested, the IDE can do this when saving). The compiler will automatically detect this and behave appropriately. So, too, will GCC, which also accepts a BOM at the start of source files without choking to death, making this approach functionally portable.

Categories

c++ - Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

c++ - Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags