Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31444

Queries against multikey trailing fields of a compound 2d index are incorrectly covered, leading to incorrect results

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.6.12, 3.0.15, 3.2.17, 3.4.9, 3.5.13
    • Component/s: Geo, Querying
    • Labels:
      None
    • Query
    • ALL

      When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:

      > db.c.drop()
      false
      > db.c.createIndex({a: "2d", "b.c": 1})
      {
      	"createdCollectionAutomatically" : true,
      	"numIndexesBefore" : 1,
      	"numIndexesAfter" : 2,
      	"ok" : 1
      }
      > db.c.insert({a: [0, 0], b: [{c: 1}, {c: 2}]})
      WriteResult({ "nInserted" : 1 })
      > db.c.find().hint({a: "2d", "b.c": 1}).returnKey()
      { "a" : BinData(128,"wAAAAAAAAAA="), "b.c" : [ 1, 2 ] }
      

      As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2].

      Because of this index format, predicates against the trailing fields of an index are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:

      > db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$eq: 2}}).explain()
      {
      	"queryPlanner" : {
      ...
      			"inputStage" : {
      				"stage" : "IXSCAN",
      				"filter" : {
      					"b.c" : {
      						"$eq" : 2
      					}
      				},
      				"keyPattern" : {
      					"a" : "2d",
      					"b.c" : 1
      				},
      				"indexName" : "a_2d_b.c_1",
      				"isMultiKey" : false,
      				"isUnique" : false,
      				"isSparse" : false,
      				"isPartial" : false,
      				"indexVersion" : 2,
      				"direction" : "forward",
      				"indexBounds" : {
      					"a" : [
      						"[BinData(128, 3FFF300000000000), BinData(128, 3FFF3FFFFFFFFFFF)]",
      						"[BinData(128, 3FFF400000000000), BinData(128, 3FFF7FFFFFFFFFFF)]",
      						"[BinData(128, 3FFF800000000000), BinData(128, 3FFFBFFFFFFFFFFF)]",
      						"[BinData(128, 3FFFC00000000000), BinData(128, 3FFFFFFFFFFFFFFF)]",
      						"[BinData(128, 6AAA000000000000), BinData(128, 6AAA3FFFFFFFFFFF)]",
      						"[BinData(128, 6AAA600000000000), BinData(128, 6AAA6FFFFFFFFFFF)]",
      						"[BinData(128, 6AAA800000000000), BinData(128, 6AAABFFFFFFFFFFF)]",
      						"[BinData(128, 6AAAC00000000000), BinData(128, 6AAAFFFFFFFFFFFF)]",
      						"[BinData(128, 9555000000000000), BinData(128, 95553FFFFFFFFFFF)]",
      						"[BinData(128, 9555400000000000), BinData(128, 95557FFFFFFFFFFF)]",
      						"[BinData(128, 9555900000000000), BinData(128, 95559FFFFFFFFFFF)]",
      						"[BinData(128, 9555C00000000000), BinData(128, 9555FFFFFFFFFFFF)]",
      						"[BinData(128, C000000000000000), BinData(128, C0003FFFFFFFFFFF)]",
      						"[BinData(128, C000400000000000), BinData(128, C0007FFFFFFFFFFF)]",
      						"[BinData(128, C000800000000000), BinData(128, C000BFFFFFFFFFFF)]",
      						"[BinData(128, C000C00000000000), BinData(128, C000CFFFFFFFFFFF)]"
      					],
      					"b.c" : [
      						"[MinKey, MaxKey]"
      					]
      				}
      			}
      		},
      		"rejectedPlans" : [ ]
      	},
      ...
      }
      

      However, this is simply incorrect for certain kinds of queries due to their matching behavior over arrays. For instance, queries which check equality against arrays may return spurious results:

      > db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": [1, 2]});
      { "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }
      

      Similarly, $type:"array" queries may return spurious results:

      > db.c.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, "b.c": {$type: "array"}});
      { "_id" : ObjectId("59d7c44bf098d066aff4d4b6"), "a" : [ 0, 0 ], "b" : [ { "c" : 1 }, { "c" : 2 } ] }
      

      You can see that these results are spurious by issuing the same queries without the $geoWithin predicate and observing that the result set is empty:

      > db.c.find({"b.c": [1, 2]});
      > db.c.find({"b.c": {$type: "array"}});
      

      This problem is closely related to that reported in SERVER-21011. However, this one does not affect "text" indexes, because the trailing fields of "text" indexes may not contain arrays. The fix for this problem is also likely to be distinct from that for SERVER-21011.

            Assignee:
            backlog-server-query Backlog - Query Team (Inactive)
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: