[SERVER-43953] Drop backing BSON in Document when possible Created: 10/Oct/19 Updated: 19/Mar/20 Resolved: 19/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Ian Boros |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | Query 2019-11-18, Query 2019-12-02, Query 2020-03-23, Query 2020-04-06 |
| Participants: |
| Description |
|
Documents store a backing BSON object in addition to a cache of fields which have already been examined. This means they may be ~2x the size of the actual user document they represent. This makes copying them slower (we call getOwned() when crossing the find/agg membrane), and increases memory consumption. We should make an effort to drop this backing BSON when it is no longer necessary to hold onto. If this ticket gets scheduled, someone should think very hard about what conditions the backing BSON can be dropped are. I would start with the following: 1) The backing BSON has been fully traversed (DocumentStorage::_bsonIt has reached the end) Point (3) is pretty broad, but for a first implementation we could only consider cases where the Document is already modified. |
| Comments |
| Comment by David Storch [ 19/Mar/20 ] |
|
ian.boros I'm happy to close this ticket as "Won't Do". Thanks for taking a look. |
| Comment by Ian Boros [ 11/Mar/20 ] |
|
Now that DocumentStorage no longer maintains a BSONObjIterator over the backing BSON (it performs lookups from the beginning each time), condition (1) does not make sense. We could instead use: (1) The number of elements in the cache is the same as the number of elements in the BSONObj. We can inexpensively get the size of the BSONObj here, as if that loop never completes, the size of the cache will not be the size of the BSONObj anyway.
With that said, I don't think this will provide much benefit. How often is to run an aggregation which references every field of the document and does not have an inclusion projection or dependency set? (which would cause us to create a fully-cached document). I'm going to talk with Dave after he gets back but I think we should close this as "Won't Do". |
| Comment by Ian Boros [ 10/Oct/19 ] |
|
Regarding point (3): In the future we could do a coarse analysis of a pipeline to determine whether the original document will be modified or discarded and, if so, construct the Document with a flag that indicates it can drop its backing BSON earlier. For example, any pipeline with a $group, inclusion projection, $lookup, $redact, $replaceRoot fit this case. If the backing BSON can be dropped in the PlanStage layer, then we don't need to copy when going to DocumentSource land only to throw it away later. |