[JAVA-2339] Reduce memory requirements for Document and BsonDocument when the number of keys is small Created: 10/Oct/16 Updated: 15/Feb/18 Resolved: 15/Feb/18 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | BSON |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jeffrey Yemin | Assignee: | Jeffrey Yemin |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Both Document and BsonDocument have an internal LinkedHashMap to store the entries in order. This is not the most efficient data structure to use when the number of entries is small. Consider using just an ArrayList to handle small numbers of entries, and only start using a hash table when the number of entries grows to more than a handful (size of handful TBD). |
| Comments |
| Comment by Jeffrey Yemin [ 15/Feb/18 ] |
|
I wrote a simple implementation of a Map which starts as an ArrayList and then adds a backing HashMap when the size goes above a threshold. Source code is here. But benchmarking with jmh reveals that it's slower than LinkedHashMap in most cases, so I'm abandoning the effort. |
| Comment by Ross Lawley [ 08/Dec/17 ] |
|
aharris the final implementation is undecided, it could be a simple list of BsonElement where each element would contain a key and value. Bson is field order sensitive, so we must maintain insertion order and couldn't use a Set. What the value of a given key is somewhat arbitary, n-dimension documents are still just key-value pairs all the way down. We'd obviously have to benchmark any changes to ensure we don't negative impact users. The CSharp driver has a List backing their BsonDocument.cs implementation and inspiration may be drawn from that. Please keep watching this ticket and we'll update with more information once progress has started. Ross |
| Comment by Andrew Harris [ 08/Dec/17 ] |
|
I wasn't talking about from a users perspective. I was merely asking how you would envisage implementing a multi-dimensional Set which supports countless imbedded Sets in an unordered single dimensional Array without introducing complex wrapper objects (which would be more expensive)? In the end, you would essentially have yet another Set implementation anyway. I suspect that even if it were possible the overhead of managing such a mechanism would far outway any resource savings you were trying to achieve in the first place. LinkedHashMap is based, of course, on a double linked list which I'll agree uses more resources than say a normal HashMap but it is required for consistent field ordering. Json isn't field order sensitive in itself but if your application might be (and certainly your unit tests might be). |
| Comment by Ross Lawley [ 08/Dec/17 ] |
|
aharris this would be an internal optimization for performance / memory usage reasons. From a users perspective there would be no difference to how they can be used. |
| Comment by Andrew Harris [ 08/Dec/17 ] |
|
Lists are not Sets so it would awkward to store a field name and value in a one-dimensional list or are you suggesting that two or more Lists are used? Also, how would this cater for n-dimensional embedded documents? |