java - Why is string.intern() so slow?

Question

Welcome To Ask or Share your Answers For Others

java - Why is string.intern() so slow?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - Why is string.intern() so slow?

Before anyone questions the fact of using string.intern() at all, let me say that I need it in my particular application for memory and performance reasons. ^[1]

So, until now I used String.intern() and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. ^[2]

Then, just recently, I tried to replace the String.intern() by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern() by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.

The question is: why is intern() so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.

[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.

[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one

EDIT:

Due to the surprising interest in this post, here is some code to test it out:

http://pastebin.com/4CD8ac69

And the results of interning a bit more than 1 million strings:

HashMap: 4 seconds
String.intern(): 54 seconds

Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:

String.intern(): 69 seconds
HashMap: 3 seconds

As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:29:28+0000

This article discusses the implementation of String.intern(). In Java 6 and 7, the implementation used a fixed size (1009) hashtable so as the number entries grew, the performance became O(n). The fixed size can be changed using -XX:StringTableSize=N. Apparently, in Java8 the default size is larger but issue remains.

Categories

java - Why is string.intern() so slow?

java - Why is string.intern() so slow?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags