Well, I'm not sure if Marko was intentional in replicating the original mistake. TL;DR; new instance is not used, gets eliminated. Adjusting the benchmark reverses the result. Don't trust faulty benchmarks, learn from them.
Here's the JMH benchmark:
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
@State(Scope.Thread)
public class Chars {
// Source needs to be @State field to avoid constant optimizations
// on sources. Results need to be sinked into the Blackhole to
// avoid dead-code elimination
private String string;
@Setup
public void setup() {
string = "12345678901234567890";
for (int i = 0; i < 10; i++) {
string += string;
}
}
@GenerateMicroBenchmark
public void newChar_DCE(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
Character c = new Character(string.charAt(i));
}
}
@GenerateMicroBenchmark
public void justChar_DCE(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
Character c = Character.valueOf(string.charAt(i));
}
}
@GenerateMicroBenchmark
public void newChar(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
Character c = new Character(string.charAt(i));
bh.consume(c);
}
}
@GenerateMicroBenchmark
public void justChar(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
Character c = Character.valueOf(string.charAt(i));
bh.consume(c);
}
}
@GenerateMicroBenchmark
public void newChar_prim(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
char c = new Character(string.charAt(i));
bh.consume(c);
}
}
@GenerateMicroBenchmark
public void justChar_prim(BlackHole bh) {
int len = string.length();
for (int i = 0; i < len; i++) {
char c = Character.valueOf(string.charAt(i));
bh.consume(c);
}
}
}
...and this is the result:
Benchmark Mode Samples Mean Mean error Units
o.s.Chars.justChar avgt 9 93.051 0.365 us/op
o.s.Chars.justChar_DCE avgt 9 62.018 0.092 us/op
o.s.Chars.justChar_prim avgt 9 82.897 0.440 us/op
o.s.Chars.newChar avgt 9 117.962 4.679 us/op
o.s.Chars.newChar_DCE avgt 9 25.861 0.102 us/op
o.s.Chars.newChar_prim avgt 9 41.334 0.183 us/op
DCE stands for "Dead Code Elimination", and that is what the original benchmark is suffering from. If we eliminate that effect, in JMH's way it requires us to sink the values into the Blackhole, the score reverses. So, in retrospect, that seems to indicate the new Character()
in the original code has major improvement with DCE, while the Character.valueOf
is not that successful. I'm not sure we should discuss why, because this has no bearing on the real world use cases, where produced Characters are actually used.
You can go further on two fronts from here:
- Get the assembly for the benchmark methods to confirm the conjecture above. See PrintAssembly.
- Run with more threads. The difference between returning cached Character and instantiating the new one would diminish as we increase the number of threads, and consequently hit the "allocation wall".
UPD: Following up on Marko's question, it does seem the major impact is about eliminating the allocation itself, whether via the EA or DCE, see *_prim tests.
UPD2: Looked into the assembly. The same run with -XX:-DoEscapeAnalysis
confirms the major effect is due to eliminating the allocation, as the effect of escape analysis:
Benchmark Mode Samples Mean Mean error Units
o.s.Chars.justChar avgt 9 94.318 4.525 us/op
o.s.Chars.justChar_DCE avgt 9 61.993 0.227 us/op
o.s.Chars.justChar_prim avgt 9 82.824 0.634 us/op
o.s.Chars.newChar avgt 9 118.862 1.096 us/op
o.s.Chars.newChar_DCE avgt 9 97.530 2.485 us/op
o.s.Chars.newChar_prim avgt 9 101.905 1.871 us/op
This proves the original DCE conjecture is incorrect. EA is the major contributor.
DCE results are still faster because we do not pay the costs of unboxing, and generally treating the returned value with any respect. Benchmark is faulty in that regard nevertheless.