-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Labels:None
-
Query Optimization
The agg-query-comparison_read_commands workload (categorized as a high-value workload) issues the following query:
find({x: {$gt: 500}, y: {$gt: 500}}, {x: 1, y: 1, _id: 0}).sort({y: 1})
with an index on
{x: 1, y: 1}
The plan that we generate looks like
queryPlan: { stage: 'SORT', planNodeId: 3, sortPattern: { y: 1 }, memLimit: 104857600, type: 'default', inputStage: { stage: 'PROJECTION_COVERED', planNodeId: 2, transformBy: { _id: false, x: true, y: true }, inputStage: { stage: 'IXSCAN', planNodeId: 1, keyPattern: { x: 1, y: 1 }, indexName: 'x_1_y_1', isMultiKey: false, multiKeyPaths: { x: [], y: [] }, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { x: [ '(500, inf.0]' ], y: [ '(500, inf.0]' ] } } } },
The plan performs a blocking sort on y. This throws away a potentially useful property of the input stream; for each unique value of x, the input is already sorted on y. If each value of x has many values of y, a better plan may be to perform a merge sort of the existing result set of that of the current value of x we are considering in the index scan. But in the worse case where each value of x has only one document, this approach is worse than a blocking sort.