-
Type:
New Feature
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The agg-query-comparison_read_commands workload (categorized as a high-value workload) issues the following query:
find({x: {$gt: 500}, y: {$gt: 500}}, {x: 1, y: 1, _id: 0}).sort({y: 1})
with an index on
{x: 1, y: 1}
The plan that we generate looks like
queryPlan: {
stage: 'SORT',
planNodeId: 3,
sortPattern: { y: 1 },
memLimit: 104857600,
type: 'default',
inputStage: {
stage: 'PROJECTION_COVERED',
planNodeId: 2,
transformBy: { _id: false, x: true, y: true },
inputStage: {
stage: 'IXSCAN',
planNodeId: 1,
keyPattern: { x: 1, y: 1 },
indexName: 'x_1_y_1',
isMultiKey: false,
multiKeyPaths: { x: [], y: [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { x: [ '(500, inf.0]' ], y: [ '(500, inf.0]' ] }
}
}
},
The plan performs a blocking sort on y. This throws away a potentially useful property of the input stream; for each unique value of x, the input is already sorted on y. If each value of x has many values of y, a better plan may be to perform a merge sort of the existing result set of that of the current value of x we are considering in the index scan. But in the worse case where each value of x has only one document, this approach is worse than a blocking sort.