So the clang
document says (emphasis mine):
This feature allows identifiers to contain certain Unicode characters,
as specified by the active language standard;
This is covered in the draft C++ standard Annex E, the characters allowed are as follows:
E.1 Ranges of characters allowed [charname.allowed]
00A8, 00AA, 00AD,
00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,
2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF F900-FD3D, FD40-FDCF,
FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD,
40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD,
90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD,
E0000-EFFFD
The code for infinity 221E
is not included in the list.
For reference: these are the codes above converted to unicode characters (some of them may not display correctly in all browsers/available fonts).
¨, a, -,
ˉ, 2-μ, ·-o, ?-?, à-?, ?-?, ?-?
ā-?, ?-?, ?-? ?-?, ?-?, ?-?, ?,
?-? ?-?, ①-?, ?-?, ?-?, ?-?
?-〇, 〡-?, ?-?
?-? 豈-?, ?-?,
?-﹄, ?-?
??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??
I could not find an extensive document that covers the rationale for the ranges chosen although N3146: Recommendations for extended identifier characters for C and C++ does provides some details on the influences.