Much of the time for any aggregation is spent transforming between the BSON storage format and the in-memory Document format used throughout the aggregation pipeline. A Document is more akin to a hash table. It has quick lookup of fields, and is easy to transform and manipulate. A BSONObj follows the spec from Bsonspec.org, which is more optimized for compact storage, so is more difficult to manipulate. Repeated lookups of field values are expensive.
This ticket tracks the work to try to speed up this conversion. Some ideas are:
- Lazily load the bson, only looking at the contents of the buffer when someone asks for it.
- Keep the original buffer around, and if the document hasn't changed since it was created, just return the original buffer when serializing it.
- As fields are requested, store them in a partially-completed hash table.
It's likely much more difficult, but we could also try to support something more akin to the MutableDocument API, where the original document is kept around, possibly in a 'de-serialized' state where updates have been made to it, but not serialized back. This might be better left for follow-up work.