Details
-
Improvement
-
Status: Backlog
-
Major - P3
-
Resolution: Unresolved
-
None
-
None
-
Query Optimization
-
Query Optimization 2021-05-17
Description
This is an extension of SERVER-9507, which allowed $group to be executed using a DISTINCT_SCAN stage. A distinct command over a multikey field can DISTINCT_SCAN a multikey index:
> db.c.drop()
|
true
|
> db.c.createIndex({a: 1})
|
{
|
"createdCollectionAutomatically" : true,
|
"numIndexesBefore" : 1,
|
"numIndexesAfter" : 2,
|
"ok" : 1
|
}
|
> db.c.insert({a: [1, 2, 3]})
|
WriteResult({ "nInserted" : 1 })
|
> db.c.insert({a: [2, 3, 4]})
|
WriteResult({ "nInserted" : 1 })
|
> db.c.explain().distinct("a")
|
{
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "test.c",
|
"indexFilterSet" : false,
|
"parsedQuery" : {
|
|
},
|
"queryHash" : "840B4B0E",
|
"winningPlan" : {
|
"stage" : "PROJECTION",
|
"transformBy" : {
|
"_id" : 0,
|
"a" : 1
|
},
|
"inputStage" : {
|
"stage" : "DISTINCT_SCAN",
|
"keyPattern" : {
|
"a" : 1
|
},
|
"indexName" : "a_1",
|
"isMultiKey" : true,
|
"multiKeyPaths" : {
|
"a" : [
|
"a"
|
]
|
},
|
"isUnique" : false,
|
"isSparse" : false,
|
"isPartial" : false,
|
"indexVersion" : 2,
|
"direction" : "forward",
|
"indexBounds" : {
|
"a" : [
|
"[MinKey, MaxKey]"
|
]
|
}
|
}
|
},
|
"rejectedPlans" : [ ]
|
},
|
"serverInfo" : {
|
"host" : "storchbox",
|
"port" : 27017,
|
"version" : "0.0.0",
|
"gitVersion" : "unknown"
|
},
|
"ok" : 1
|
}
|
When the same distinct operation is expressed using the aggregation framework, however, the $unwind inhibits the DISTINCT_SCAN optimization added in SERVER-9507:
> db.c.explain().aggregate([{$unwind: {path: "$a", preserveNullAndEmptyArrays: true}}, {$group: {_id: "$a"}}])
|
{
|
"stages" : [
|
{
|
"$cursor" : {
|
"query" : {
|
|
},
|
"fields" : {
|
"a" : 1,
|
"_id" : 0
|
},
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "test.c",
|
"indexFilterSet" : false,
|
"parsedQuery" : {
|
|
},
|
"queryHash" : "8B3D4AB8",
|
"winningPlan" : {
|
"stage" : "COLLSCAN",
|
"direction" : "forward"
|
},
|
"rejectedPlans" : [ ]
|
}
|
}
|
},
|
{
|
"$unwind" : {
|
"path" : "$a",
|
"preserveNullAndEmptyArrays" : true
|
}
|
},
|
{
|
"$group" : {
|
"_id" : "$a"
|
}
|
}
|
],
|
"ok" : 1
|
}
|
This task is to add logic in order to optimize $unwind-$group pipelines to use DISTINCT_SCAN when appropriate. It is closely related to SERVER-27494, which describes how $unwind can sometimes be optimized away even without generating a DISTINCT_SCAN plan.
Attachments
Issue Links
- is depended on by
-
SERVER-68567 distinct command on the Time-Series collection doesn't utilize the existing index
-
- Backlog
-
- is related to
-
SERVER-27494 Avoid unwind on multi-key index field for covered index aggregation
-
- Backlog
-
-
SERVER-9507 Optimize $sort+$group+$first pipeline to avoid full index scan
-
- Closed
-
- related to
-
SERVER-55112 Behaviour of distinct differs between collections and views
-
- Closed
-
-
SERVER-27915 Make $group with $addToSet accumulator use DISTINCT_SCAN when applicable
-
- Backlog
-