[SERVER-33920] Optimize transformation from BSON to Document in aggregation framework Created: 15/Mar/18  Updated: 06/Dec/22  Resolved: 10/Jul/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 1
Labels: optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-40968 Core DocumentStorage changes for Docu... Closed
duplicates SERVER-40969 No-op Document/Value to BSON conversi... Closed
is duplicated by SERVER-36983 Views are unnecessary slow, even for ... Closed
Assigned Teams:
Query
Participants:

 Description   

Much of the time for any aggregation is spent transforming between the BSON storage format and the in-memory Document format used throughout the aggregation pipeline. A Document is more akin to a hash table. It has quick lookup of fields, and is easy to transform and manipulate. A BSONObj follows the spec from Bsonspec.org, which is more optimized for compact storage, so is more difficult to manipulate. Repeated lookups of field values are expensive.

This ticket tracks the work to try to speed up this conversion. Some ideas are:

  1. Lazily load the bson, only looking at the contents of the buffer when someone asks for it.
  2. Keep the original buffer around, and if the document hasn't changed since it was created, just return the original buffer when serializing it.
  3. As fields are requested, store them in a partially-completed hash table.

It's likely much more difficult, but we could also try to support something more akin to the MutableDocument API, where the original document is kept around, possibly in a 'de-serialized' state where updates have been made to it, but not serialized back. This might be better left for follow-up work.



 Comments   
Comment by David Storch [ 10/Jul/19 ]

Ideas #1 and #3 from the description, which describe lazy construction of the internal document storage, have been has been implemented as part of SERVER-40968. Idea #2 is scheduled and in progress under SERVER-40969. I'm closing this ticket as a duplicate of the more recent tickets under which the development work is actually being completed.

Generated at Thu Feb 08 04:34:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.