When an ExclusionProjectionExecutor is used to apply a projection like:
The code walks the input document, and for each field, determines whether to "project" it. Since the code for applying the projection is generalized to work for both inclusion and exclusion projections, the value for the field is always provided. By reading the value field, we load it into the document cache.
For exclusion projections, the value of the field doesn't matter, since we're going to exclude it. So the work of loading it into cache is completely wasted.
Recently, two fast paths have been added for exclusion projection, which do a direct BSON -> BSON transform. These were added under
SERVER-61284 (ProjectionSimple) and SERVER-70353 which adds a fast path in ExclusionProjectionExecutor. These both went into 6.2, and make it harder to hit this problem, since the fast paths are used more frequently than the generic path which has the bug. However, for older versions, it's still very easy to run into this problem. Anyone upgrading from 4.2 -> 4.4 is likely to see this issue.
For example, here is a test script which creates one document with a field blocks which is a 40,000 element array. Then it runs a query to exclude just the blocks field. Since blocks is such a large field, the (wasted) time spent loading it into cache dominates the runtime for 4.4:
If you compare this with a run on 6.2 or later, it is much slower (around 30x on my machine).
In short, on affected versions, an exclusion projection requires us to completely copy and shred the fields we don't want to keep.
This ticket tracks the work of fixing the problem in the default ExclusionProjectionExecutor path. Whether we also want to backport the new fast paths to older versions is a separate question.