[JAVA-1924] When using upsert=true with a custom codec, encodercontext returns isEncodingCollectibleDocument = false Created: 14/Aug/15  Updated: 31/Aug/15  Resolved: 27/Aug/15

Status: Closed
Project: Java Driver
Component/s: API, BSON, Codecs
Affects Version/s: 3.0.0, 3.0.1, 3.0.2, 3.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Guillermo Campelo Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible

 Description   

when using :
MongoCollection<SomeEntity> gameCollection = database.getCollection("entities", SomeEntity.class);

gameCollection.findOneAndUpdate(filter, update, new FindOneAndUpdateOptions().upsert(true));

SomeEntityCodec's encode method is called with a context that returns encodingCollectibleDocument=false

See:
com.mongodb.MongoCollectionImpl : 374
toBsonDocument(update)

bson.toBsonDocument(documentClass, codecRegistry)

neither RawBsonDocument(wrapped, codec), nor BsonDocumentWrapper(wrappedDocument, encoder) takes into account the upsert flag(if it is true, it SHOULD be a encodingCollectibleDocument



 Comments   
Comment by Ross Lawley [ 27/Aug/15 ]

Hi guicamest,

Just to be clear, any implementation of CollectibleCodec must include the _id field in it's encode method, independently of it is an insert or an update? If that's OK, please close this issue as not a bug.

This is definitely is not true and not the case. Lets look at the Document class. It has a DocumentCodec that is an implementation of the CollectibleCodec<Document>, when inserting, we generate the _id and mutate the document - allowing the user to know the _id of the inserted document. Should encode always include an _id? No! because Documents are used for all sorts of things, queries, sorts, index definitions etc... Making encode always include the _id in most circumstances would be wrong.

Remember as jeff.yemin mentioned previously, isEncodingCollectibleDocument is for the cases when you are representing a BSON Document as it would be stored by MongoDB. In these scenarios _id if present comes first. That is all that flag is for.

Should isEncodingCollectibleDocument = true always be the case?

This is nuanced and depends on the situation. For example take the following compound index: {price: 1, _id: 1}. If isEncodingCollectibleDocument was always true that would make the index mean something totally different.

When should a CollectibleCodec encode a _id into the document?

In what circumstances do we know that we are actually representing a document as it will be stored in MongoDB? The answer is only when doing an insert. So only in those circumstances can we call generateIdIfAbsentFromDocument.

(This may or may not add an _id, depending on implementation, if it doesn't then MongoDB will generate the _id).

What about Upserts?

Update queries can take multiple keys, in the Bson representation of the update eg : $set $setOnInsert. Does the Bson type representing an update, represent a Document as it is to be stored by MongoDB? The answer is no, but the $set part could represent the document or it could equally be updates to fragments of the Document that is stored in MongoDB.

I hope that clarifies the current design and the reasons it acts as such. As the Bson update provided to an upsert doesn't necessarily represent a Document as it will be stored, I'm closing this ticket as "Won't Fix". Currently, you will manually have to ensure _id is in either the query or in the $setOnInsert part of the update.

However, I have added JAVA-1939 to track the possibility of injecting a /{$setOnInsert: {_id: generatedId}/ into upserts when using CollectibleCodecs which if feasible, could achieve what you are after!

Ross

Comment by Guillermo Campelo [ 27/Aug/15 ]

Hi Ross, thanks for the depth of the explanation!

IMHO, the parameter should be renamed, as a CollectibleCodec is directly related to ids and isEncodingCollectibleDocument gives the idea of whether the id should be generated or not. May I suggest the isInsert name?

Just to be clear, any implementation of CollectibleCodec must include the _id field in it's encode method, independently of it is an insert or an update? If that's OK, please close this issue as not a bug.

Thanks a lot for the clarification about this issue and keep up with the good work!

Comment by Ross Lawley [ 24/Aug/15 ]

Hi guicamest,

I can understand the confusion, however, the driver does not know if any given upsert operation will result in an update or an insert. Only the MongoDB server can tell. If the operation results in an insert and the _id isn't supplied, the MongoDB server automatically generates one. For that reason we can't automatically generate an _id for upserts that are inserts rather than updates.

I hope that clarifies the reasons why isEncodingCollectibleDocument is false for upserts.

Ross

Comment by Guillermo Campelo [ 19/Aug/15 ]

Hi Jeff, any updates?

Comment by Guillermo Campelo [ 15/Aug/15 ]

It's kind of confusing, CollectibleCodec states A Codec that generates complete BSON documents for storage in a MongoDB collection and has methods for checking/generating ids, but as you say, encodingCollectibleDocument has nothing to do with the _id field except for the order.

Besides that, how is a custom CollectibleCodec supposed to know when to generate an _id field or not? If it is an upsert=false, then it shouldn't generate the _id field, but if it is an upsert=true and the document is to be inserted, it should generate it.

Comment by Jeffrey Yemin [ 14/Aug/15 ]

No, that's not what the flag indicates. While one could potentially use it for other purposes, It's currently used in the driver only to encode the fields in such a way that the _id field is placed first in the BSON representation of the document.

Comment by Guillermo Campelo [ 14/Aug/15 ]

Well, the flag indicates whether or not include the "_id" field in the serialization, correct? If it is an upsert=true, it might be an insert, in which case the _id should be serialized.

Comment by Jeffrey Yemin [ 14/Aug/15 ]

Yes, that's the expected behavior, and it is not governed at all by the upsert flag. The encodingCollectibleDocument property will be true only when encoding a document that is going to be inserted in a collection, but never for filters or updates (which use $ operators like $set). You can see it's intended usage here, for example.

What are you trying to use the flag for?

Generated at Thu Feb 08 08:55:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.