[SERVER-43953] Drop backing BSON in Document when possible Created: 10/Oct/19  Updated: 19/Mar/20  Resolved: 19/Mar/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ian Boros Assignee: Ian Boros
Resolution: Won't Do Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Query 2019-11-18, Query 2019-12-02, Query 2020-03-23, Query 2020-04-06
Participants:

 Description   

Documents store a backing BSON object in addition to a cache of fields which have already been examined. This means they may be ~2x the size of the actual user document they represent. This makes copying them slower (we call getOwned() when crossing the find/agg membrane), and increases memory consumption. We should make an effort to drop this backing BSON when it is no longer necessary to hold onto.

If this ticket gets scheduled, someone should think very hard about what conditions the backing BSON can be dropped are. I would start with the following:

1) The backing BSON has been fully traversed (DocumentStorage::_bsonIt has reached the end)
AND
2) The metadata has been loaded from the backing BSON or the backing BSON has no metadata
AND
3) The original BSON will not be returned to the user. i.e. the Document has been modified, will be modified, or will be discarded. If this is the case then we'll never take the toBson() fast path which uses the original backing BSON.

Point (3) is pretty broad, but for a first implementation we could only consider cases where the Document is already modified.



 Comments   
Comment by David Storch [ 19/Mar/20 ]

ian.boros I'm happy to close this ticket as "Won't Do". Thanks for taking a look.

Comment by Ian Boros [ 11/Mar/20 ]

Now that DocumentStorage no longer maintains a BSONObjIterator over the backing BSON (it performs lookups from the beginning each time), condition (1) does not make sense. We could instead use:

(1) The number of elements in the cache is the same as the number of elements in the BSONObj. We can inexpensively get the size of the BSONObj here, as if that loop never completes, the size of the cache will not be the size of the BSONObj anyway.

 

 

With that said, I don't think this will provide much benefit. How often is to run an aggregation which references every field of the document and does not have an inclusion projection or dependency set? (which would cause us to create a fully-cached document). I'm going to talk with Dave after he gets back but I think we should close this as "Won't Do".

Comment by Ian Boros [ 10/Oct/19 ]

Regarding point (3): In the future we could do a coarse analysis of a pipeline to determine whether the original document will be modified or discarded and, if so, construct the Document with a flag that indicates it can drop its backing BSON earlier. For example, any pipeline with a $group, inclusion projection, $lookup, $redact, $replaceRoot fit this case. If the backing BSON can be dropped in the PlanStage layer, then we don't need to copy when going to DocumentSource land only to throw it away later.

Generated at Thu Feb 08 05:04:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.