Description
The query planner currently generates DISTINCT_SCAN plans for distinct operations on dotted fields, if the distinct index is non-multikey. Due to SERVER-2104, distinct operations on dotted fields can never be covered, so DISTINCT_SCAN plans should never be generated for them.
See, the following explain output for such an example distinct operation. The query planner generates a full index scan plan with a FETCH node, which is much more costly than a collection scan plan.
> db.foo.drop()
|
true
|
> db.foo.ensureIndex({"a.b":1}) |
{
|
"createdCollectionAutomatically" : true, |
"numIndexesBefore" : 1, |
"numIndexesAfter" : 2, |
"ok" : 1 |
}
|
> db.foo.explain().distinct('a.b') |
{
|
"queryPlanner" : { |
"plannerVersion" : 1, |
"namespace" : "test.foo", |
"indexFilterSet" : false, |
"parsedQuery" : { |
|
|
},
|
"winningPlan" : { |
"stage" : "PROJECTION", |
"transformBy" : { |
"_id" : 0, |
"a.b" : 1 |
},
|
"inputStage" : { |
"stage" : "FETCH", |
"inputStage" : { |
"stage" : "DISTINCT_SCAN", |
"keyPattern" : { |
"a.b" : 1 |
},
|
"indexName" : "a.b_1", |
"isMultiKey" : false, |
"isUnique" : false, |
"isSparse" : false, |
"isPartial" : false, |
"indexVersion" : 1, |
"direction" : "forward", |
"indexBounds" : { |
"a.b" : [ |
"[MinKey, MaxKey]" |
]
|
}
|
}
|
}
|
},
|
"rejectedPlans" : [ ] |
},
|
"serverInfo" : { |
"host" : "rassi", |
"port" : 27017, |
"version" : "3.3.4", |
"gitVersion" : "2ab9b1fd7ea20319e2d9ddb0532234729877703f" |
},
|
"ok" : 1 |
}
|
Relatedly, the second paragraph of the method comment for the getDistinctNodeIndex() helper is incorrect:
* Returns true if indices contains an index that can be used with DistinctNode (the "fast distinct |
* hack" node, which can be used only if there is an empty query predicate). Sets indexOut to the |
* array index of PlannerParams::indices. Look for the index for the fewest fields. Criteria for |
* suitable index is that the index cannot be special (geo, hashed, text, ...), and the index cannot
|
* be a partial index.
|
*
|
* Multikey indices are not suitable for DistinctNode when the projection is on an array element. |
* Arrays are flattened in a multikey index which makes it impossible for the distinct scan stage |
* (plan stage generated from DistinctNode) to select the requested element by array index.
|
*
|
* Multikey indices cannot be used for the fast distinct hack if the field is dotted. Currently the |
* solution generated for the distinct hack includes a projection stage and the projection stage |
* cannot be covered with a dotted field.
|
The comment above does not correctly describe the behavior of distinct against an array. The semantics of distinct operations are such that arrays are flattened:
> db.foo.drop()
|
true
|
> db.foo.insert({a:1})
|
WriteResult({ "nInserted" : 1 }) |
> db.foo.insert({a:[2,3]})
|
WriteResult({ "nInserted" : 1 }) |
> db.foo.distinct('a') |
[ 1, 2, 3 ]
|
Attachments
Issue Links
- related to
-
SERVER-23561 Distinct on sub-documents array doesn't use index
-
- Closed
-
-
SERVER-2104 covered index should support dotted fields
-
- Closed
-