C++11 has this functionality:
std::string s = u8"Hello, World!";
// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;
std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);
However to my knowledge the only implementation that has this so far is libc++. C++11 also has std::codecvt_utf8_utf16<char16_t>
which some other implementations have. Specifically, codecvt_utf8_utf16
works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16 you can use this to convert between UTF-8 and Windows' native encoding.
The specialization codecvt<char16_t, char, mbstate_t>
converts between the UTF-16 and UTF-8 encoding
schemes, and the specialization codecvt<char32_t, char, mbstate_t>
converts between the UTF-32 and
UTF-8 encoding schemes.
— [locale.codecvt] 22.4.1.4/3
Oh, and std::codecvt specializations have protected destructors, and wstring_convert requires access to the destructor so you really need an adapter:
template <class Facet>
class usable_facet : public Facet {
public:
using Facet::Facet; // inherit constructors
~usable_facet() {}
// workaround for compilers without inheriting constructors:
// template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
};
template<typename internT, typename externT, typename stateT>
using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;
std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…