Can instanceof make Unmodifiable Collections faster?
In a battle-hardened code base, all normal bugs were fixed - those that remain are usually quite bizarre. This is a story of one such bug and how it uncovered a less than optimal implementation in the JDK itself.
The Bug
Do you know those bugs that only happen after the program ran for a while?
Hibernate's Gunnar Morling recently told the story of one that, after a few days, would lead to a StackOverflowException
on calling Set::isEmpty
.
He tracked it down to a faulty implementation of a cache for sets:
On each access, the cache would obtain a Set
, wrap it in Collections::unmodifiableSet
, and then reinsert it, thus piling more and more unmodifiable wrappers on the underlying set.
Thing is, the methods on the wrapper, take isEmpty
as an example, can't do anything themselves and need to forward to the underlying collection:
// actual implementation of
// java.util.Collections.UnmodifiableCollection#isEmpty
// (`c` is the wrapped collection)
public boolean isEmpty() {
return c.isEmpty();
}
Pile wrappers long enough, each forwarding method calls to the next, and you will overflow the stack. It's turtles all the way down.
The Reason
The immediate reason for the bug was the faulty cache implementation, but there's something else here:
Why does Collections::unmodifiableSet
wrap an already unmodifiable set?
Couldn't it check whether the specified set is already unmodifiable and return it as-is if that's the case?
From a technical point of view, nothing speaks against that, and the method's contract doesn't call for returning a new instance either.
The remaining question is, does it make a difference?
The Benchmark
Surely, forwarding method calls is a waste of time, right?
Or is there some clever optimization that can nullify that effect?
One straightforward way to find out is to benchmark it.
Simply wrap a Set
into unmodifiable wrappers and check how long various operations take.
Besides iteration and isEmpty
, I also want to benchmark the effect on linear and random access, so I test lists as well as sets.
(You can find the code on GitHub and the results in this Google Spreadsheet. I use JHM - if you want to learn about it, check out the samples.)
Setup
Here are my specs:
- CPU: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
- RAM: Samsung DDR3 16GB @ 1.60GHz (the tests ran entirely in RAM)
- OS: Gentoo, kernel version 4.14.83-gentoo
- Java: OpenJDK 11.0.1
- JMH: 1.21
To set up the benchmarks, I create collections with size
elements and wrap them depth
times.
Putting aside efforts to reuse code, this is what it looks like for HashSet
:
// create the underlying collection with `size` elements
Set<Integer> underlyingCollection = new Random()
.ints(size).mapToObj(i -> i)
.collect(Collectors.toCollection(HashSet::new));
// wrap it `depth` times - benchmarks use `unmodifiableCollection`
Set<Integer> unmodifiableCollection = underlyingCollection;
for (int i = 0; i < depth; i++)
unmodifiableCollection = Collections.unmodifiableSet(unmodifiableCollection);
return unmodifiableCollection;
For lists it looks very similar, but in addition, I create int[] getAts
.
This int
array has the same length as the list and contains random list indices.
I will use it to access the wrapped list in an unpredictable pattern.
Benchmark methods
These are the benchmark methods for both lists and sets:
@Benchmark
public boolean isEmpty() {
return unmodifiableCollection.isEmpty();
}
@Benchmark
public void forEach(Blackhole hole) {
unmodifiableCollection.forEach(hole::consume);
}
@Benchmark
public void iterator(Blackhole hole) {
for (int i : unmodifiableCollection)
hole.consume(i);
}
A few words on the JMH features I use here:
@Benchmark
lets JMH know that I want it to benchmark that method- returning the result of calling
isEmpty
lets JMH ensure that the JIT compiler doesn't see the method is dead code (i.e. has no effect and is hence redundant), in which case it would eliminate it - sending the list's elements into
Blackhole
is a more pedestrian approach to the same effect
On lists, there are two more things I want to benchmark - linear and random access:
@Benchmark
public void linearAccess(Blackhole hole) {
List<Integer> list = unmodifiableCollection();
for (int i = 0; i < list.size(); i++)
hole.consume(list.get(i));
}
@Benchmark
public void randomAccess(Blackhole hole) {
List<Integer> list = unmodifiableCollection();
for (int at : getAts)
hole.consume(list.get(at));
}
Enough with the build up - what are the results?!
Measurements
I ran the banchmarks with varying collections (HashSet
, LinkedHashSet
, ArrayList
, LinkedList
), collection sizes (10k, 100k, 1m), and wrapping dephts (0, 1, 10, 1000; I also dabbled with 10k, but it would often lead to an StackOverflowException
).
Keeping in mind that it's the last factor we're really interested in, here are a few observations that reduce the amount of data we have to look at:
- the collection type has no relevant effect; of course different collections have different absolute run times, but the wrapping depth always has a very similar influence
- the collection size has the effect you would expect, independent of wrapping depth:
- for a constant operation like
isEmpty
, collection size doesn't matter - for iterating operations, each 10x increase in size roughly leads to a 10x increase in run time (a notable exception seems to be
HashSet
, where the factor seems to be about x15, but let's not dwell on that here)
- for a constant operation like
Now let's look at how run times depend on wrapping depth.
The following data is for an ArrayList
with a million elements, but as we've just established, it's representative of other collection types and sizes.
Benchmark | Depth | Score Error Units |
---|---|---|
forEach | 0 | 6499.524 ± 116.113 μs/op |
forEach | 1 | 6466.625 ± 79.003 μs/op |
forEach | 10 | 6572.141 ± 49.209 μs/op |
forEach | 100 | 6429.473 ± 35.505 μs/op |
forEach | 1000 | 6756.348 ± 103.719 μs/op |
isEmpty | 0 | 0.003 ± 0.001 μs/op |
isEmpty | 1 | 0.004 ± 0.001 μs/op |
isEmpty | 10 | 0.015 ± 0.001 μs/op |
isEmpty | 100 | 0.205 ± 0.004 μs/op |
isEmpty | 1000 | 2.420 ± 0.020 μs/op |
iterator | 0 | 5729.565 ± 429.593 μs/op |
iterator | 1 | 7343.901 ± 334.612 μs/op |
iterator | 10 | 35199.093 ± 285.387 μs/op |
iterator | 100 | 417297.905 ± 9552.693 μs/op |
iterator | 1000 | 4778937.048 ± 46199.758 μs/op |
linearAccess | 0 | 3774.128 ± 39.618 μs/op |
linearAccess | 1 | 4714.176 ± 22.733 μs/op |
linearAccess | 10 | 30394.111 ± 218.719 μs/op |
linearAccess | 100 | 417612.502 ± 4279.777 μs/op |
linearAccess | 1000 | 4661914.890 ± 30868.705 μs/op |
randomAccess | 0 | 7231.825 ± 587.417 μs/op |
randomAccess | 1 | 7406.953 ± 692.301 μs/op |
randomAccess | 10 | 26135.629 ± 688.995 μs/op |
randomAccess | 100 | 243328.907 ± 6699.941 μs/op |
randomAccess | 1000 | 2457828.925 ± 17852.538 μs/op |
Let's start with isEmpty
.
It has virtually no cost, but forwarding the call down the seemingly endless chain of wrappers does.
You can see how that cost scales roughly linearly with wrapping depth.
Then, take a look at forEach
.
At first, it may seem surprising that its run time is so stable.
Then again, consider the implementation in the unmodifiable wrapper:
public void forEach(Consumer<? super E> action) {
c.forEach(action);
}
As you can see, it doesn't really do anything except forwarding the given consumer to the next wrapper.
We can see that iterating over a million elements takes about 6500 μs and we know from isEmpty
that forwarding a call down a thousand wrappers takes about 2μs.
Given those numbers, the stable performance isn't surprising after all.
Finally, let's have a look at iteration, linear access, and random access.
Unsurprisingly randomAccess
starts off slower than linearAccess
, because it jumps all over the underlying data structure and hence constantly produces cache misses.
But then random access catches up and eventually overtakes linear access - what the hell?!
Take a look at the methods' code again:
While linearAccess
has two forwarding calls (to size
and get
), randomAccess
has only one (to get
), so it stands to reason that - as unwrapping begins to dominate execution time - the added cost of cache misses become less relevant.
Comparing different methods, we can deduce that the first wrapper (i.e. depth 1) has a measurable, but not dominating performance impact. Most importantly, though:
There's a clear linear relation between wrapping depth and run time for all iteration methods except forEach
.
And while wrapping a collection often enough to crash the JVM is surely the consequence of a bug, wrapping it a few, maybe even a dozen times, can happen easily in an environment where constructors routinely wrap incoming lists.
Benchmarking Instance Checks
It is easy to avoid unmodifiable collections from piling up - simply add an instanceof
check to Collections::unmodifiable...
that returns the given instance if it already is unmodifiable:
public static <T> Set<T> unmodifiableSetWithInstanceCheck(Set<T> s) {
if (s instanceof UnmodifiableSet)
return s;
return new UnmodifiableSet<>(s);
}
While the change is easy, is it also cheap?
Here are the results of a benchmark that wraps a Set
1, 10, 100, 1k, and 10k times:
Benchmark | Depth | Score Error Units |
---|---|---|
originalUnmodifiable | 1 | 8.508 ± 0.770 ns/op |
originalUnmodifiable | 10 | 50.997 ± 2.696 ns/op |
originalUnmodifiable | 100 | 413.526 ± 74.531 ns/op |
originalUnmodifiable | 1_000 | 3405.815 ± 382.755 ns/op |
originalUnmodifiable | 10_000 | 32218.869 ± 3729.403 ns/op |
withInstanceOfCheck | 1 | 6.236 ± 0.514 ns/op |
withInstanceOfCheck | 10 | 15.241 ± 0.686 ns/op |
withInstanceOfCheck | 100 | 108.349 ± 1.621 ns/op |
withInstanceOfCheck | 1_000 | 964.785 ± 19.586 ns/op |
withInstanceOfCheck | 10_000 | 9575.831 ± 197.473 ns/op |
Wait, what? How is the instance check faster?! Here's my theory:
- while we have the additional
instanceOf
check, we avoid the creation of unnecessaryUnmodifiableSet
instances (memory allocation is dirt cheap, but it's not free) - we greatly reduce pressure on the garbage collector because we create way fewer instances
The problem with that theory is that it doesn't explain why withInstanceOfCheck
is faster even for depth 1.
Investigating that goes beyond the scope of this article, though, and so I'll leave that as an exercise for the reader.
Or a future post.
The Conclusion
As we have seen, wrapping collections with Collections.unmodifiable...
once has next to no effect, but if it happens a few times on performance critical collections, it can harm performance.
Since the technical and performance side as well as the methods' contracts allow an implementation where already unmodifiable collections aren't wrapped again, I think that change should be made.
NB: The copyOf
methods on List
, Set
, and Map
(introduced in Java 10) already behave this way, so they do not suffer from the same drawback.