The query planner currently generates DISTINCT_SCAN plans for distinct operations on dotted fields, if the distinct index is non-multikey. Due to SERVER-2104, distinct operations on dotted fields can never be covered, so DISTINCT_SCAN plans should never be generated for them.
See, the following explain output for such an example distinct operation. The query planner generates a full index scan plan with a FETCH node, which is much more costly than a collection scan plan.
> db.foo.drop() true > db.foo.ensureIndex({"a.b":1}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.foo.explain().distinct('a.b') { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "parsedQuery" : { }, "winningPlan" : { "stage" : "PROJECTION", "transformBy" : { "_id" : 0, "a.b" : 1 }, "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "DISTINCT_SCAN", "keyPattern" : { "a.b" : 1 }, "indexName" : "a.b_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "a.b" : [ "[MinKey, MaxKey]" ] } } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "rassi", "port" : 27017, "version" : "3.3.4", "gitVersion" : "2ab9b1fd7ea20319e2d9ddb0532234729877703f" }, "ok" : 1 }
Relatedly, the second paragraph of the method comment for the getDistinctNodeIndex() helper is incorrect:
* Returns true if indices contains an index that can be used with DistinctNode (the "fast distinct * hack" node, which can be used only if there is an empty query predicate). Sets indexOut to the * array index of PlannerParams::indices. Look for the index for the fewest fields. Criteria for * suitable index is that the index cannot be special (geo, hashed, text, ...), and the index cannot * be a partial index. * * Multikey indices are not suitable for DistinctNode when the projection is on an array element. * Arrays are flattened in a multikey index which makes it impossible for the distinct scan stage * (plan stage generated from DistinctNode) to select the requested element by array index. * * Multikey indices cannot be used for the fast distinct hack if the field is dotted. Currently the * solution generated for the distinct hack includes a projection stage and the projection stage * cannot be covered with a dotted field.
The comment above does not correctly describe the behavior of distinct against an array. The semantics of distinct operations are such that arrays are flattened:
> db.foo.drop() true > db.foo.insert({a:1}) WriteResult({ "nInserted" : 1 }) > db.foo.insert({a:[2,3]}) WriteResult({ "nInserted" : 1 }) > db.foo.distinct('a') [ 1, 2, 3 ]
- related to
-
SERVER-23561 Distinct on sub-documents array doesn't use index
- Closed
-
SERVER-2104 covered index should support dotted fields
- Closed