[SERVER-31444] Queries against multikey trailing fields of a compound 2d index are incorrectly covered, leading to incorrect results Created: 06/Oct/17  Updated: 06/Dec/22  Resolved: 13/Oct/17

Status: Closed
Project: Core Server
Component/s: Geo, Querying
Affects Version/s: 2.6.12, 3.0.15, 3.2.17, 3.4.9, 3.5.13
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-21011 Certain queries against compound 2d/t... Closed
Related
related to SERVER-31530 $elemMatch over trailing field of a 2... Closed
is related to SERVER-23909 Allow "2d" indices to accept a collation Backlog
is related to SERVER-24095 Set a feature bit in the KVCatalog wh... Closed
is related to SERVER-15086 Allow for efficient range queries ove... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:

> db.c.drop()
false
> db.c.createIndex({a: "2d", "b.c": 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]})
WriteResult({ "nInserted" : 1 })
> db.c.find().hint({a: "2d", "b.c": 1}).returnKey()
{ "a" : BinData(128,"wAAAAAAAAAA="), "b.c" : [ 1, 2 ] }

As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2].

Because of this index format, predicates against the trailing fields of an index are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$eq: 2}}).explain()
{
	"queryPlanner" : {
...
			"inputStage" : {
				"stage" : "IXSCAN",
				"filter" : {
					"b.c" : {
						"$eq" : 2
					}
				},
				"keyPattern" : {
					"a" : "2d",
					"b.c" : 1
				},
				"indexName" : "a_2d_b.c_1",
				"isMultiKey" : false,
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]",
						"[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]",
						"[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]",
						"[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]",
						"[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]",
						"[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]",
						"[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]",
						"[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]",
						"[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]",
						"[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]",
						"[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]",
						"[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]",
						"[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]",
						"[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]",
						"[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]",
						"[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]"
					],
					"b.c" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
...
}

However, this is simply incorrect for certain kinds of queries due to their matching behavior over arrays. For instance, queries which check equality against arrays may return spurious results:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": [1, 2]});
{ "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }

Similarly, $type:"array" queries may return spurious results:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$type: "array"}});
{ "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }

You can see that these results are spurious by issuing the same queries without the $geoWithin predicate and observing that the result set is empty:

> db.c.find({"b.c": [1, 2]});
> db.c.find({"b.c": {$type: "array"}});

This problem is closely related to that reported in SERVER-21011. However, this one does not affect "text" indexes, because the trailing fields of "text" indexes may not contain arrays. The fix for this problem is also likely to be distinct from that for SERVER-21011.



 Comments   
Comment by David Storch [ 13/Oct/17 ]

After further investigation, it appears that this is a duplicate of SERVER-21011. The fix pushed under SERVER-21011 (744738bd23) resolves all of the known problems described here.

However, our investigation did turn up yet another problem related to the trailing fields of a "2d" index: see SERVER-31530.

Comment by David Storch [ 06/Oct/17 ]

The easiest way to fix this issue is to stop assigning predicates to the trailing fields of "2d" indexes wholesale. However, this fix would likely result in a loss of performance for some workloads (and would render compound "2d" indexes more or less useless). Another way to fix this issue would be to stop assigning predicates a trailing field of a "2d" index when that field can contain an array. This would involve enabling path-level multikey tracking, taking advantage of the infrastructure added for SERVER-15086. Such a fix would involve upgrade/downgrade considerations, since existing stable versions of the server cannot correctly handle multikey metadata information associated with "2d" indexes. We would likely handle this problem by adding a new feature bit to the KVCatalog's feature tracking system, along the same lines as SERVER-24095.

A third way to fix, and probably the most desirable (and most complex), is to change the index format for "2d" indexes. This would bring the planning code for "2d" in line with that for regular indexes, and would allow us to generate bounds over all "2d" indexes, even those whose non-2d fields contain arrays. However, this comes with it a slew of upgrade/downgrade and mixed version related concerns.

Curiously, this problem does not affect indexes whose "2d" field is itself multikey:

> db.c.drop();
> db.c.createIndex({a: "2d", "b.c": 1});
> db.c.insert({a: [[0, 0], [0, 1]], b: [{c: 1}, {c: 2}]});
WriteResult({ "nInserted" : 1 })
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}});
{ "_id" : ObjectId("59d7c962f098d066aff4d4b9"), "a" : [ [ 0, 0 ], [ 0, 1 ] ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$type: "array"}});
// No results.
> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": [1, 2]});
// No results.

The reason is that the access planner already refrains from adding covered filters to an IXSCAN stage when the index is tagged as multikey:

https://github.com/mongodb/mongo/blob/e65d9be4895ac11837abc0d04007e2f83143d95f/src/mongo/db/query/planner_access.cpp#L1361-L1376

Generated at Thu Feb 08 04:27:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.