First, it’s not failing reliably. I managed to have some runs where no exception occurred. This, however doesn’t imply that the resulting map is correct. It’s also possible that each thread witnesses its own value being successfully put, while the resulting map misses several mappings.
But indeed, failing with a NullPointerException
happens quite often. I created the following debug code to illustrate the HashMap
’s working:
static <K,V> void debugPut(HashMap<K,V> m, K k, V v) {
if(m.isEmpty()) debug(m);
m.put(k, v);
debug(m);
}
private static <K, V> void debug(HashMap<K, V> m) {
for(Field f: FIELDS) try {
System.out.println(f.getName()+": "+f.get(m));
} catch(ReflectiveOperationException ex) {
throw new AssertionError(ex);
}
System.out.println();
}
static final Field[] FIELDS;
static {
String[] name={ "table", "size", "threshold" };
Field[] f=new Field[name.length];
for (int ix = 0; ix < name.length; ix++) try {
f[ix]=HashMap.class.getDeclaredField(name[ix]);
}
catch (NoSuchFieldException ex) {
throw new ExceptionInInitializerError(ex);
}
AccessibleObject.setAccessible(f, true);
FIELDS=f;
}
Using this with the simple sequential for(int i=0; i<5; i++) debugPut(m, i, i);
printed:
table: null
size: 0
threshold: 1
table: [Ljava.util.HashMap$Node;@70dea4e
size: 1
threshold: 1
table: [Ljava.util.HashMap$Node;@5c647e05
size: 2
threshold: 3
table: [Ljava.util.HashMap$Node;@5c647e05
size: 3
threshold: 3
table: [Ljava.util.HashMap$Node;@33909752
size: 4
threshold: 6
table: [Ljava.util.HashMap$Node;@33909752
size: 5
threshold: 6
As you can see, due to the initial capacity of 0
, there are three different backing arrays created even during the sequential operation. Each time, the capacity is increased, there is a higher chance that a racy concurrent put
misses the array update and creates its own array.
This is especially relevant for the initial state of an empty map and several threads trying to put their first key, as all threads might encounter the initial state of a null
table and create their own. Also, even when reading the state of a completed first put
, there is a new array created for the second put
as well.
But step-by-step debugging revealed even more chances of breaking:
Inside the method putVal
, we see at the end:
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
In other words, after the successful insertion of a new key, the table will get resized, if the new size exceeds the threshold
. So on the first put
, resize()
is called at the beginning because the table is null
and since your specified initial capacity is 0
, i.e. too low to store one mapping, the new capacity will be 1
and the new threshold
will be 1 * loadFactor == 1 * 0.75f == 0.75f
, rounded to 0
. So right at the end of the first put
, the new threshold
is exceeded and another resize()
operation triggered. So with an intial capacity of 0
, the first put
already creates and populates two arrays, which gives much higher chances to break, if multiple threads perform this action concurrently, all encountering the initial state.
And there is another point. Looking into the resize()
operation we see the lines:
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
… (transfer old contents to new array)
In other words, the new array reference is stored into the heap before it has been populated with the old entries, so even without reordering of reads and writes, there is a chance that another thread reads that reference without seeing the old entries, including the one it has written itself previously. Actually, optimizations reducing the heap access may lower the chances of a thread not seeing its own update in an immediately following query.
Still, it must also noted that the assumption that everything runs interpreted here, is not founded. Since HashMap
is used by the JRE internally as well, even before your application starts, there is also a chance of encountering already compiled code when using HashMap
.