[JAVA-822] cache the hash calculation in ObjectId Created: 01/May/13 Updated: 25/Jun/13 Resolved: 25/Jun/13 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Ryan Nitz | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Object ids are used in maps and sets all over the place. It makes sense to cache the hash computation in a similar manor as the String.hashCode() method. From String.java
|
| Comments |
| Comment by Ryan Nitz [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
I deleted the old attachment and added my latest test. The variance in numbers was gc kicking in. By manually calling gc after each test, here are the results: string: 54 This is still odd... but it seems like the overhead is not that great. | |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
Oh that makes sense, I set the initial capacity in my benchmark. Well, the caching will not actually take effect until you start re-using the same objects. These benchmarks simply prove the cost of calculating it the first time. ObjectId is a fairly simple calculation for the hash, but String needs to hash every character in the String. You might find when adding a String to further HashSets you get the benefit of caching, since only then will this method make use of the cached value. | |||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
Hmmm... it is the (increase) sizing of the set. If you set the initial capacity to objIds.length, you see something similar to what you saw. I added an initial call to the string hashCode() into the initialization loop and the output is: objectid: 69 Now, I am actually interested in how it is possible that the ObjectId is faster. | |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
OK so here's the deal - the first run effectively warms up the JVM, so it's slower. In your test case, this is the ObjectId run, so it takes a lot longer. When I run your test on my machine, my results correspond to yours: However, when I switch round the order you run them in, so the String goes first, I get: So this test does not conclusively prove that adding Strings is quicker than object IDs. Code for the switched test:
| |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
Ooops, you did already! Checking.... | |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
I've created a very simple microbenchmark that creates 1,000,000 ObjectIds, then times how long it takes to add all of these into a HashSet. Without warming up the JVM, this gives: When I do the same thing with a million different Strings and time how long it takes to add them to a HashSet, I get: In this benchmark, adding 1,000,000 unique Strings into a HashSet is substantially slower than adding 1,000,000 ObjectIds to a HashSet. This suggests to me that we are not testing the same thing. Can you share your test code please? | |||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
Output from the test (in ms): objectid: 819 This tests adding 1,000,000 items to a set. | |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
But if they're all different IDs, caching each individual result won't help, I don't believe? Unless I'm missing something - how can caching the results of each single hashCode calculation improve the case of adding 100k ObjectIds? Have you got a test case I can look at? | |||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
I have a set with 100k object ids in it, adding each new id is expensive. | |||||||||||||||||||||||||||||
| Comment by Trisha Gee [ 23/May/13 ] | |||||||||||||||||||||||||||||
|
It does seem like an obvious place for optimisation. However, mathematical calculations are something that the CPU is extremely good at, they take very few cycles. So if this is called many times, it's not a very expensive operation (unless you show me a performance test that proves otherwise). In addition, this is exactly the sort or calculation that the JVM is very good at optimising - caching this is only valuable if the method for an individual ObjectId is called many times, and if this is the case the JVM will probably optimise this call for you anyway. If there's evidence this is causing performance issues, we will address this problem. |