[JAVA-153] BSONEncoder putObject optimization for Map Created: 24/Aug/10 Updated: 29/Oct/10 Resolved: 24/Aug/10 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | 2.1 |
| Fix Version/s: | 2.2 |
| Type: | Improvement | Priority: | Trivial - P5 |
| Reporter: | Kirk Wylie | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
In BSONEncoder.putObject(), the core field put loop is: For Map-based implementations of DBObject/BSONObject (the defaults), this is a fast operation. However, if the underlying data isn't directly stored in a Map (but in a List or some other non-random-access data structure), this is relatively inefficient. There are two potential solutions to this: #1 is the solution which doesn't involve backwards incompatibility (as it doesn't involve adding a new method on BSONObject which would have to be implemented by all subclasses), but #2 would be the cleanest in comparison with the existing code. |
| Comments |
| Comment by auto [ 24/Aug/10 ] |
|
Author: {'login': 'scotthernandez', 'name': 'Scott Hernandez', 'email': 'scotthernandez@hotmail.com'}Message: BSONEncoder map optimization |
| Comment by Scott Hernandez (Inactive) [ 24/Aug/10 ] |
|
Patch supplied: http://github.com/scotthernandez/mongo-java-driver/commit/ce35ed2c27d74a06ced8dbd711febab94bdb6283 |
| Comment by Kirk Wylie [ 24/Aug/10 ] |
|
Scott, correct I was. I only upgraded FudgeMsg's MongoDB client driver to 2.1 today, so I was still speaking about the 1.4 status. You know, it might be that Eliot's suggestion is the cleanest one. Our FudgeFieldContainer can easily support a minimal subset of the Map interface (sufficient for that performance trick to work), and I know that quite a few people have actually asked that of us anyway, so if that hack went in it would definitely work for us. |
| Comment by Scott Hernandez (Inactive) [ 24/Aug/10 ] |
|
Kirk is talking about the 1.4 code-base wrt/BasicDBObject, as that is how BasicDBObject used to be if I remember correctly... The instanceof works for the driver code just fine. I'm not sure it addresses fudgemsg code but that is another issue, I think. Once we add the instanceof code then the fudgemsg facet can implement map and just provide a working entrySet() method with the rest stub'd out. |
| Comment by Eliot Horowitz (Inactive) [ 24/Aug/10 ] |
|
BasicDBObject is a map. Not sure I ever want to add more methods to BSONObject since I want to keep it simple to implement in your own classes |
| Comment by Kirk Wylie [ 24/Aug/10 ] |
|
Well, the if (o instanceof Map) won't work for BasicDBObject, as it isn't a Map (has-a rather than is-a). Nor for any implementation/extension of BSONObject (because BSONObject isn't a Map). Your solution will only work for where someone passes a Map directly, which isn't the problem here. An optional slightly ugly thing is to have an ExtendedBSONObject interface with new methods, have existing logic implement that, and check for instanceof inside the code: if (o instanceof ExtendedBSONObject) { That would work as well. Then in a 3.0 driver, having given fair warning, you can merge the methods from ExtendedBSONObject back up to BSONObject. I've done this quite a bit to have backwards compatibility on interfaces where I want to add new methods. |
| Comment by Eliot Horowitz (Inactive) [ 24/Aug/10 ] |
|
I don't think we're going to add a method to BSONObject. The code can be changed to something like this though if ( o instanceof Map ){ Requires no changes to any code or libraries, and should make faster? Thoughts? |
| Comment by Kirk Wylie [ 24/Aug/10 ] |
|
That's precisely why I'm suggesting #2. I'm also synthesizing a DBObject on top of a FudgeFieldContainer, and otherwise I would have to construct a Map on the fly just for that call. |
| Comment by Scott Hernandez (Inactive) [ 24/Aug/10 ] |
|
Yeah, a fork with a pull request would be best. I see the possible performance issue now. I would suggest #2, but I'm sure Eliot has an opinion as well. The problem with #1 is there a few implementations where you need to synthesize the Entry(s) (like ReflectionDBObject) and don't have a map to work from. This sounds like a worse solution for you as well. |
| Comment by Kirk Wylie [ 24/Aug/10 ] |
|
Well, yes. Aside from the fact that the code in question is clearly a hot spot in the client, and sub-optimal (it's virtually always faster to iterate over the entries when you have to consume them all). The particular issue is around the FudgeMsg (www.fudgemsg.org) interconnection with MongoDB. For various reasons, a FudgeMsg doesn't have a Map built in for random access to fields (since most Fudge processors are streaming for performance even though it's a messaging system). Our current MongoDB integration approach involves building BasicDBObjects from a FudgeFieldContainer, but that involves a number of object copies and thrashes the heap under heavy load (even though the BasicDBObjects are short lived). Would it help if I just do the work in a clone off the master in GitHub and send a pull request, or submit a patch? I can do #1 OR #2 in that case (and in fact could do both in different branches). |
| Comment by Scott Hernandez (Inactive) [ 24/Aug/10 ] |
|
Since this isn't an issue now, and I'm not sure it will ever be (since what you describe still doesn't sound like an issue), this seems like a dead ticket. Is there any reason to keep this alive? |