[COLLECTIONS-878] MapUtils.invertMap improve HashMap construction#652
[COLLECTIONS-878] MapUtils.invertMap improve HashMap construction#652ytatichno wants to merge 2 commits into
Conversation
|
Hello @ytatichno -1: You are confusing initial capacity and load factor. Step through the debugger for |
|
Thank you for your instant feedback! I fully agree that the constructor parameter of HashMap(int initialCapacity) is not the same as the table size and that load factor remains constant (0.75) regardless input map's loadFactor. The optimization I proposed is not related to the load factor value itself, but to avoiding a resize that can happen when the map is filled exactly up to its declared size. For example, when using new HashMap<>(map.size()), the internal threshold becomes (int)(map.size() * 0.75). Using Math.ceil(map.size() / 0.75d) as initial capacity avoids that resize and results in a table that is sized exactly to hold all entries without rehashing. This is also consistent with the JDK’s approach in HashMap.calculateHashMapCapacity (introduced in Java 19). Here it is: /**
* Calculate initial capacity for HashMap based classes, from expected size and default load factor (0.75).
*
* @param numMappings the expected number of mappings
* @return initial capacity for HashMap based classes.
* @since 19
*/
static int calculateHashMapCapacity(int numMappings) {
return (int) Math.ceil(numMappings / (double) DEFAULT_LOAD_FACTOR);
}
/**
* Creates a new, empty HashMap suitable for the expected number of mappings.
* The returned map uses the default load factor of 0.75, and its initial capacity is
* generally large enough so that the expected number of mappings can be added
* without resizing the map.
*
* @param numMappings the expected number of mappings
* @param <K> the type of keys maintained by the new map
* @param <V> the type of mapped values
* @return the newly created map
* @throws IllegalArgumentException if numMappings is negative
* @since 19
*/
public static <K, V> HashMap<K, V> newHashMap(int numMappings) {
if (numMappings < 0) {
throw new IllegalArgumentException("Negative number of mappings: " + numMappings);
}
return new HashMap<>(calculateHashMapCapacity(numMappings));
}
```
I can add a short test showing the extra resize if you think it would clarify my intentions. |
|
Hello @ytatichno |
|
Hello @garydgregory It's kind of hard to mock JRE classes for application level testing without unpleasant experience, so I took the core logic from invertMap and created a small isolated test that proves the improvement. The test uses an input map with 14 entries. It builds two output maps: one using the original capacity formula and one using the proposed formula and adds the same entries in the same order. After each put, it checks via reflection whether the internal table array has changed identity. The first change on each map (from null to a freshly allocated array) is the lazy initialization; any subsequent change is a genuine resize. I see that link for old formula changed twice: one time for initialization from null to not null link and one extra time that we perhaps don't want to have. And the link for new formula changed only once for initialization from null to freshly allocated array link. Note that on Java 9+ it requires Regarding the 0.75d in the patch: would you prefer it to be extracted as a named constant, or is a short explanatory comment sufficient? And as You remember, this extra resize triggered for 50% of sizes of input map and won't be reproduced on 12 elements test i.e. |
According to https://issues.apache.org/jira/projects/COLLECTIONS/issues/COLLECTIONS-878
Total summary:
Why
0.75d:We have extra resize for 50% of Maps. In example:
Let's inspect for maps with size on [4, 16].
To sum up:
For each power of two we have segment from 50% to 100%. Because 50% is previous power of two. And for each initial size from 50% to 75% (DEFAULT_LOAD_FACTOR) everything ok, but for sizes from 75% to 100% we have extra resize. So we have 50% of probability to have extra resize.