|
The query planner currently generates DISTINCT_SCAN plans for distinct operations on dotted fields, if the distinct index is non-multikey. Due to SERVER-2104, distinct operations on dotted fields can never be covered, so DISTINCT_SCAN plans should never be generated for them.
See, the following explain output for such an example distinct operation. The query planner generates a full index scan plan with a FETCH node, which is much more costly than a collection scan plan.
> db.foo.drop()
|
true
|
> db.foo.ensureIndex({"a.b":1})
|
{
|
"createdCollectionAutomatically" : true,
|
"numIndexesBefore" : 1,
|
"numIndexesAfter" : 2,
|
"ok" : 1
|
}
|
> db.foo.explain().distinct('a.b')
|
{
|
"queryPlanner" : {
|
"plannerVersion" : 1,
|
"namespace" : "test.foo",
|
"indexFilterSet" : false,
|
"parsedQuery" : {
|
|
},
|
"winningPlan" : {
|
"stage" : "PROJECTION",
|
"transformBy" : {
|
"_id" : 0,
|
"a.b" : 1
|
},
|
"inputStage" : {
|
"stage" : "FETCH",
|
"inputStage" : {
|
"stage" : "DISTINCT_SCAN",
|
"keyPattern" : {
|
"a.b" : 1
|
},
|
"indexName" : "a.b_1",
|
"isMultiKey" : false,
|
"isUnique" : false,
|
"isSparse" : false,
|
"isPartial" : false,
|
"indexVersion" : 1,
|
"direction" : "forward",
|
"indexBounds" : {
|
"a.b" : [
|
"[MinKey, MaxKey]"
|
]
|
}
|
}
|
}
|
},
|
"rejectedPlans" : [ ]
|
},
|
"serverInfo" : {
|
"host" : "rassi",
|
"port" : 27017,
|
"version" : "3.3.4",
|
"gitVersion" : "2ab9b1fd7ea20319e2d9ddb0532234729877703f"
|
},
|
"ok" : 1
|
}
|
Relatedly, the second paragraph of the method comment for the getDistinctNodeIndex() helper is incorrect:
* Returns true if indices contains an index that can be used with DistinctNode (the "fast distinct
|
* hack" node, which can be used only if there is an empty query predicate). Sets indexOut to the
|
* array index of PlannerParams::indices. Look for the index for the fewest fields. Criteria for
|
* suitable index is that the index cannot be special (geo, hashed, text, ...), and the index cannot
|
* be a partial index.
|
*
|
* Multikey indices are not suitable for DistinctNode when the projection is on an array element.
|
* Arrays are flattened in a multikey index which makes it impossible for the distinct scan stage
|
* (plan stage generated from DistinctNode) to select the requested element by array index.
|
*
|
* Multikey indices cannot be used for the fast distinct hack if the field is dotted. Currently the
|
* solution generated for the distinct hack includes a projection stage and the projection stage
|
* cannot be covered with a dotted field.
|
The comment above does not correctly describe the behavior of distinct against an array. The semantics of distinct operations are such that arrays are flattened:
> db.foo.drop()
|
true
|
> db.foo.insert({a:1})
|
WriteResult({ "nInserted" : 1 })
|
> db.foo.insert({a:[2,3]})
|
WriteResult({ "nInserted" : 1 })
|
> db.foo.distinct('a')
|
[ 1, 2, 3 ]
|
|