data structures - Why is parameter to string.indexOf method is an int in Java

Question

Welcome To Ask or Share your Answers For Others

data structures - Why is parameter to string.indexOf method is an int in Java

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

data structures - Why is parameter to string.indexOf method is an int in Java

I am wondering why the parameter to indexOf method an int , when the description says a char.

public int indexOf(int ch)

Returns the index within this string of the first occurrence of the specified **character**

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html#indexOf%28int%29

Also, both of these compiles fine:
char c = 'p';
str.indexOf(2147483647);
str.indexOf(c);

a]Basically, what I am confused about is int in java is 32bit , while unicode characters are 16 bits .

b]Why not use the character themselves rather than using int . Is this any performance optimization ?. Are chars difficult to represent than int ? How ?

I assume this should be simple reasoning for this and that makes me know about it even more !

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:00:54+0000

The real reason is that indexOf(int) expects a Unicode codepoint, not a 16-bit UTF-16 "character". Unicode code points are actually up to 21 bits in length.

(The UTF-16 representation of a longer codepoint is actually 2 16-bit "character" values. These values are known as leading and trailing surrogates; D800₁₆ to DBFF₁₆, and DC00₁₆ to DFFF₁₆ respectively; see Unicode FAQ - UTF-8, UTF-16, UTF-32 & BOM for the gory details.)

If you give indexOf(int) a code point > 65535 it will search for the pair of UTF-16 characters that encode the codepoint.

This is stated by the javadoc (albeit not very clearly), and an examination of the code indicates that this is indeed how the method is implemented.

Why not just use 16-bit characters ?

That's pretty obvious. If they did that, there wouldn't be an easy way to locate code points greater than 65535 in Strings. That would be a major problem for people who develop internationalized applications where text may contain such code points. (A lot of supposedly internationalized applications make the incorrect assumption that a char represents a code point. Often it doesn't matter, but increasingly often it does.)

But it shouldn't make any difference to you. The method will still work if your Strings consist of only 16 bit codes ... or, for that matter, of only ASCII codes.

Categories

data structures - Why is parameter to string.indexOf method is an int in Java

data structures - Why is parameter to string.indexOf method is an int in Java

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags