During query execution, when we need to lookup a Value inside a Document, there are two possible code paths:
- The Value already resides in DocumentStorage.
- We need to create the Value by reading the backing BSON object.
The second case requires a linear scan of the BSON in order to find a BSONElement with the correct field name. During this scan, we currently eagerly convert all of the BSONElements we encounter into Values which reside inside the DocumentSource cache. This is done to ensure that only one forwards scan of the BSON object is necessary, avoiding potential quadratic performance required to look up n fields in a Document. However, our performance tests show that this eager conversion of BSONElement to Value is costly. In many cases, it is likely wasted work, since we might never need to read the field again later on. In order to improve performance, we should only pull the fields that the caller actually looked up into DocumentStorage.
In the future, if we find that repeated scans of the backing BSON are expensive, we can implement query analysis in order to compute a set of fields that are likely to be looked up later, and then pull these only these fields into the document's cache.