The real reason is that indexOf(int)
expects a Unicode codepoint, not a 16-bit UTF-16 "character". Unicode code points are actually up to 21 bits in length.
(The UTF-16 representation of a longer codepoint is actually 2 16-bit "character" values. These values are known as leading and trailing surrogates; D80016 to DBFF16, and DC0016 to DFFF16 respectively; see Unicode FAQ - UTF-8, UTF-16, UTF-32 & BOM for the gory details.)
If you give indexOf(int)
a code point > 65535 it will search for the pair of UTF-16 characters that encode the codepoint.
This is stated by the javadoc (albeit not very clearly), and an examination of the code indicates that this is indeed how the method is implemented.
Why not just use 16-bit characters ?
That's pretty obvious. If they did that, there wouldn't be an easy way to locate code points greater than 65535 in Strings. That would be a major problem for people who develop internationalized applications where text may contain such code points. (A lot of supposedly internationalized applications make the incorrect assumption that a char
represents a code point. Often it doesn't matter, but increasingly often it does.)
But it shouldn't make any difference to you. The method will still work if your Strings consist of only 16 bit codes ... or, for that matter, of only ASCII codes.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…