[SERVER-31530] $elemMatch over trailing field of a 2d index can miss results Created: 12/Oct/17  Updated: 30/Oct/23  Resolved: 11/Jan/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.12, 3.0.15, 3.2.17, 3.4.9, 3.5.13
Fix Version/s: 3.7.1

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: Martin Neupauer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-31444 Queries against multikey trailing fie... Closed
is related to SERVER-21011 Certain queries against compound 2d/t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

db.c.drop();
db.c.createIndex({a: "2d", "b.c": 1});
db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]});
assert.eq(1, db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, b: {$elemMatch: {c: 1}}}).itcount());

Sprint: Query 2017-11-13, Query 2017-12-04, Query 2017-12-18, Query 2018-01-01, Query 2018-01-15
Participants:

 Description   

When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:

> db.c.drop()
false
> db.c.createIndex({a: "2d", "b.c": 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]})
WriteResult({ "nInserted" : 1 })
> db.c.find().hint({a: "2d", "b.c": 1}).returnKey()
{ "a" : BinData(128,"wAAAAAAAAAA="), "b.c" : [ 1, 2 ] }

As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2].

Because of this index format, predicates against the trailing fields of an index typically are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$eq: 2}}).explain()
{
	"queryPlanner" : {
...
			"inputStage" : {
				"stage" : "IXSCAN",
				"filter" : {
					"b.c" : {
						"$eq" : 2
					}
				},
				"keyPattern" : {
					"a" : "2d",
					"b.c" : 1
				},
				"indexName" : "a_2d_b.c_1",
				"isMultiKey" : false,
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]",
						"[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]",
						"[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]",
						"[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]",
						"[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]",
						"[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]",
						"[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]",
						"[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]",
						"[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]",
						"[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]",
						"[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]",
						"[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]",
						"[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]",
						"[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]",
						"[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]",
						"[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]"
					],
					"b.c" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
...
}

When the predicate over the trailing field is an $elemMatch, however, the planner incorrectly generates bounds:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, b: {$elemMatch: {c: 1}}}).explain()
{
	"queryPlanner" : {
...
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"a" : "2d",
					"b.c" : 1
				},
				"indexName" : "a_2d_b.c_1",
				"isMultiKey" : false,
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"a" : [
						"[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]",
						"[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]",
						"[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]",
						"[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]",
						"[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]",
						"[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]",
						"[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]",
						"[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]",
						"[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]",
						"[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]",
						"[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]",
						"[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]",
						"[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]",
						"[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]",
						"[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]",
						"[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]"
					],
					"b.c" : [
						"[1.0, 1.0]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
...
}

As a result, this query misses the matching document:

> db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, b: {$elemMatch: {c: 1}}});
// No results.



 Comments   
Comment by Githook User [ 10/Jan/18 ]

Author:

{'email': 'martin.neupauer@mongodb.com', 'name': 'Martin Neupauer', 'username': 'MartinNeupauer'}

Message: SERVER-31530 Certain queries using 2d index could incorrecly compute
boundaries delivering incorrect results. The root cause was in incorrectly
propagating index tags during the query plan construction.
Branch: master
https://github.com/mongodb/mongo/commit/37cb7ea09393e88662e3139bd20fa29e59f2a1a3

Comment by Martin Neupauer [ 06/Dec/17 ]

The fixing up of sort order is one approach.
The 2d index uses different format for multikey that is not compatible with computing seek boundaries. It seems that we cannot use non-2d keys at all for seeking.
One possible solution is to completely disable tagging of non-2d keys in the index selection. The query plan would then have an extra filter operation on top of the index seek. This solution is relatively easy and straightforward.
The second option is to push the non-2d keys to storage engine as sargable predicates, if possible.

Comment by Tess Avitabile (Inactive) [ 12/Oct/17 ]

I think this is related to the order in which MatchExpression nodes are processed by the access planner. In the non-elemMatch case, the geo node is processed first. In the elemMatch case, the geo node is processed second. I expect that the access planner thinks the geo node will be processed first, and when the elemMatch node is processed first, it fails to clear the elemMatch node's bounds when it processes the geo node.

The reason the elemMatch node is processed first is because its index tag has position 0, since we do not populate the position of the parent tag when tagging the tree for sort. This means that it will sort equal to the geo node in the TagComparison function. The regular equality node gets position 1, so it sorts after the geo node.

When I try sorting GEO nodes first in TagComparison, the elemMatch query has bounds [MinKey, MaxKey] for the elemMatch node and it is answered correctly. Another solution would be to propagate index positions to the parent in tagForSort().

Generated at Thu Feb 08 04:27:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.