Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying
Labels:
- neweng

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The query planner currently generates DISTINCT_SCAN plans for distinct operations on dotted fields, if the distinct index is non-multikey. Due to ~~SERVER-2104~~, distinct operations on dotted fields can never be covered, so DISTINCT_SCAN plans should never be generated for them.

See, the following explain output for such an example distinct operation. The query planner generates a full index scan plan with a FETCH node, which is much more costly than a collection scan plan.

> db.foo.drop()
true
> db.foo.ensureIndex({"a.b":1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.explain().distinct('a.b')
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.foo",
		"indexFilterSet" : false,
		"parsedQuery" : {

		},
		"winningPlan" : {
			"stage" : "PROJECTION",
			"transformBy" : {
				"_id" : 0,
				"a.b" : 1
			},
			"inputStage" : {
				"stage" : "FETCH",
				"inputStage" : {
					"stage" : "DISTINCT_SCAN",
					"keyPattern" : {
						"a.b" : 1
					},
					"indexName" : "a.b_1",
					"isMultiKey" : false,
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 1,
					"direction" : "forward",
					"indexBounds" : {
						"a.b" : [
							"[MinKey, MaxKey]"
						]
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "rassi",
		"port" : 27017,
		"version" : "3.3.4",
		"gitVersion" : "2ab9b1fd7ea20319e2d9ddb0532234729877703f"
	},
	"ok" : 1
}

Relatedly, the second paragraph of the method comment for the getDistinctNodeIndex() helper is incorrect:

 * Returns true if indices contains an index that can be used with DistinctNode (the "fast distinct
 * hack" node, which can be used only if there is an empty query predicate).  Sets indexOut to the
 * array index of PlannerParams::indices.  Look for the index for the fewest fields.  Criteria for
 * suitable index is that the index cannot be special (geo, hashed, text, ...), and the index cannot
 * be a partial index.
 *
 * Multikey indices are not suitable for DistinctNode when the projection is on an array element.
 * Arrays are flattened in a multikey index which makes it impossible for the distinct scan stage
 * (plan stage generated from DistinctNode) to select the requested element by array index.
 *
 * Multikey indices cannot be used for the fast distinct hack if the field is dotted.  Currently the
 * solution generated for the distinct hack includes a projection stage and the projection stage
 * cannot be covered with a dotted field.

The comment above does not correctly describe the behavior of distinct against an array. The semantics of distinct operations are such that arrays are flattened:

> db.foo.drop()
true
> db.foo.insert({a:1})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({a:[2,3]})
WriteResult({ "nInserted" : 1 })
> db.foo.distinct('a')
[ 1, 2, 3 ]

related to

SERVER-23561 Distinct on sub-documents array doesn't use index

Closed

SERVER-2104 covered index should support dotted fields

Closed

Assignee:: David Storch
Reporter:: J Rassi (Inactive)
Participants:: David Storch, J Rassi
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Apr 06 2016 08:42:23 PM UTC
Updated:: May 17 2016 04:06:38 PM UTC
Resolved:: May 17 2016 04:06:35 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates