This is more complicated than you might think. According to the ECMAScript standard, an identifier is:
an IdentifierName that is not a ReservedWord
so first you would have to check that the identifier is not one of:
instanceof typeof break do new var case else return void catch finally
continue for switch while this with debugger function throw default if
try delete in
and potentially some others in the future.
An IdentifierName starts with:
a letter
the $ sign
the _ underscore
and can further comprise any of those characters plus:
a number
a combining diacritical (accent) character
various joiner punctuation and zero-width spaces
These characters are defined in terms of Unicode character classes, so [A-Z]
is incomplete. ?
is a letter; ξ
is a letter; 京
is a letter. You can use all of those in identifiers including those used for function names.
Unfortunately, JavaScript RegExp is not Unicode-aware. If you say w
you only get the ASCII alphanumerics. There is no feasible way to check the validity of non-ASCII identifier characters short of carrying around the relevant parts of the Unicode Character Database with your script, which would be very large and clumsy.
You could try simply allowing all non-ASCII characters, for example:
^[_$a-zA-ZxA0-uFFFF][_$a-zA-Z0-9xA0-uFFFF]*$
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…