[JAVA-2416] BSONReader mark - add support to unset a mark (in contrast to resetting), and add support for multiple marks Created: 22/Dec/16 Updated: 28/Jan/18 Resolved: 15/Mar/17 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | BSON, Codecs |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Michael Weber | Assignee: | Jeffrey Yemin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
n/a |
||
| Issue Links: |
|
||||||||||||||||
| Description |
|
It would be great if a mark set on a org.bson.BsonReader could be unset. Right now only resetiing to the mark(ed) position is possible. My use case(s): use case 2) The main issue here is, that I cannot be sure that I don’t know if the nested decoder uses a mark itself. I hope I did not misunderstood the concept behind the BSONReader logic in general. If so, please make me aware of my wrong thinking. |
| Comments |
| Comment by Githook User [ 15/Mar/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'jyemin', u'name': u'Jeff Yemin', u'email': u'jeff.yemin@10gen.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 04/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Michael, That's an interesting use case, and not one I've encountered before. Thanks for taking the time to think it through. I'll tentatively schedule this for the next release. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Weber [ 04/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, it's actually a real use case for us. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think what you'd need to do is set the mark at the start of the document, and if an exception is thrown then reset to the mark and then skip the entire document (via skipValue). In that case, you would need more than one mark. But is that a real use case? Would your application function properly if one or more documents can't be decoded? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Weber [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok, cool. I was asking myself how to implement proper exception handling in case of reading multiple entities in one go. Is there a better way of implementing a faukt tolerant exception handling? I know I could wrap any reader.readXYZ with a try catch. But this seems somehow odd to me. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That's true about regular expressions but that type is rarely stored in documents and decoded. It's almost always used in queries and therefore only encoded by the driver. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Weber [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, thanks for clearing things up for me.
It seems in case of regular expressions the byte buffer is read until a null byte is found. I agree that an unoptimized algorithm makes the code much easier and I might opt for this solution. Though I feel that other users of the java driver may stumble upon the "mark cannot be unset or set multiple times" problematic as well in different contexts in the future. I may be mistaken though. Thank a lot once more. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In the current implementation, there is a single buffer underlying the entire response from the server, and therefore it is not garbage collected until decoding of the entire response is complete. Whether the mark is set has no effect on this.
It's actually not expensive, as the sizes are stored directly in the BSON structure and therefore don't need to be calculated. See http://bsonspec.org/spec.html for details. Let me know if this information changes the situation sufficiently to justify the unoptimized algorithm. Since the type field is likely to be one of the first fields, usually the decoder won't have to skip much at all. In the worst case it will have to skip each metadata field as well as the payload field, but the payload field is cheap to skip since, as indicated above, its size is encoded into the BSON structure. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Weber [ 02/Jan/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, the algorithm you describe is more or less how I implemented the reflection codec in the first run. Let me explain my use case in more detail in order to demonstrate the need for a more fine grained mark/unmark/reset control. We decided to split any entity into meta data and the actual payload whereby the meta data are properties stored as first level properties and the payload is stored in the field payload. All entites have identical meta data fields ("realId", "version", "metaInfoA", "type", etc.) We could have put all meta data fields into a dedicated field to compose them, but for some reason that I do not recall (maybe some mongo internals like partial index or so??) we put them into level zero. Depending on the field "type" the payload contains different properties.
The resulting java-Pojo has a different structure as follows:
The java code to decode this looks as follows:
As MetaData is a known structure for all entities we optimized the decoding in a way that all meta data is read independently from the resulting type. In the end when the type is known and the payload is decoded, the meta data is injected into that java object. I am aware I could follow your suggested algorithm as described in your previous comment to avoid this issue. I hope you can follow my motivation for implementing a fine grained control over marks. Cheers Michael | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 29/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Michael, I don't think it's necessary to have more than one mark to build a reflection-based codec. The pseudocode I imagine would look something like this:
So the mark is only set when finding the type, then immediately reset. By the time you actually decode the specific type, which is where the recursion happens, the mark has already been reset. I believe that takes care of use case 2. As for use case 1, the decoder already has to handle the case where the type field is not the first, and ignore it. So while it would be a nice optimization to not have to do that when the type field has to be first, it's not strictly necessary for correctness. And by using BsonReader.skipValue() when that field is encountered during decoding, you can avoid the cost of decoding the type string if you'd like. My thinking about this is informed by looking at our .NET driver, which has a full reflection-based codec included with it. While it's BsonReader has the more extensive bookmarking capabilities that you're requesting, those capabilities are not actually used by its own decoder. Let me know what you think. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Weber [ 23/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, thanks for your answers. Assuming you would write a Codec similar to org.bson.codecs.DocumentCodec If now (a) mark(s) would be set (they are not used in DocumentCodec currently) these "nested" Codecs would throw the following error in case they would rely on setting a mark themself. Fortunately I haven't seen the use of mark() in the Codecs that are shipped with the mongo driver, so the problem does not exist there. In my case I would love to use marks more often. And I actually nest my ReflectionCodec, hence I nest mark()s A solotion whereby I could simply overwrite the last set mark (as you proposed in your first comment) is not feasable since I do not know, if the wrapping decode needs to unset the reader to a previous remembered state. Either marks are put on a stack structure and they need to be unset/reset in the order they are put on the readers stack or as you pointed out handles to set marks are returned in order to have fine grained control. I hope you now understand my use case? If not, let me know and I can assemble a code example. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 22/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For reference, here is prior art from the C# driver: https://github.com/mongodb/mongo-csharp-driver/blob/ee97df791d0a42853295bcc5cd791e37b8bb076d/src/MongoDB.Bson/IO/BsonReader.cs#L119-L119 Rather than requiring a stack, it exposes each bookmark as an object whose position can be returned to in any order. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 22/Dec/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Michael, I understand use case 1 and it seems that it would be enough to simply relax the current restriction that a mark can only be set if there is no active mark, and the only way to clear a mark is to reset it. In other words, this should be allowed:
Alternatively, or in addition, we could add BsonReader.clearMark(), which has the advantage of communicating to the BsonReader implementation that the current mark no longer has to be remembered. For the second use case, I'm less clear on what you need. Would a stack of marks be sufficient, or do you need more control than that? |