Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78260

Contained $or rewrite optimization incorrectly lifts a negation predicate outside an $elemMatch leading to missing query results

    • Query Optimization
    • Minor Change
    • ALL
    • QO 2023-07-10, QO 2023-07-24, QO 2023-08-07, QO 2023-08-21, QO 2023-09-04
    • 158

      In MQL, negation predicates behave differently depending on whether or not they are within an $elemMatch. Let's consider the behavior of the predicate {a: {$ne: 1}}. Here is an example of this predicate used inside an $elemMatch:

      > db.c.drop()
      > db.c.insert({_id: 1, arr: [{a: 1}, {a: 1}]})
      > db.c.insert({_id: 2, arr: [{a: 1}, {a: 2}]})
      > db.c.insert({_id: 3, arr: [{a: 2}, {a: 2}]})
      > db.c.find({arr: {$elemMatch: {a: {$ne: 1}}}})
      { "_id" : 2, "arr" : [ { "a" : 1 }, { "a" : 2 } ] }
      { "_id" : 3, "arr" : [ { "a" : 2 }, { "a" : 2 } ] }
      

      The semantics of this query are that the document matches if any array element matches the "a != 1" predicate. Let's compare this to a similar query with no $elemMatch using the same example data:

      > db.c.find({"arr.a": {$ne: 1}})
      { "_id" : 3, "arr" : [ { "a" : 2 }, { "a" : 2 } ] }
      

      This query finds documents where all array elements match the "a != 1" predicate. In slightly more technical terms, the former query applies the logical negation inside the implicit array traversal whereas the latter applies the negation outside the implicit array traversal.

      This means that one cannot perform either of the following query rewrites:

      Rewrite 1) {arr: {$elemMatch: {a: {$ne: 1}}}} -> {"arr.a": {$ne: 1}}
      Rewrite 2) {arr: {$elemMatch: {a: {$ne: 1}}}} -> {$and: [{arr: {$elemMatch: {a: {$ne: 1}}}}, {"arr.a": {$ne: 1}}
      

      I found a case where we are essentially doing the rewrite that I've notated as "Rewrite 2" above, leading to missing query results. Here's a simple repro script:

      (function() {
      "use strict";
      
      const coll = db.c;
      coll.drop();
      
      let indexKeyPattern = {"arr.a": 1, "c": 1, "d": 1};
      assert.commandWorked(coll.createIndex(indexKeyPattern));
      
      let predicate = {arr: {$elemMatch: {a: {$ne: 1}}}, $or: [{c: 4, d: 5}, {c: 6, d: 7}]};
      
      // This document should match.
      assert.commandWorked(coll.insert({_id: 0, arr: [{a: 1}, {a: 2}], c: 4, d: 5, a: 1}));
      assert.eq(coll.find(predicate).itcount(), 1);
      
      // Insert a bunch of data that doesn't match in order to make the multi-planner choose an index
      // union plan. The only matching document should still be the one inserted above with _id:0.
      for (let i = 0; i < 100; ++i) {
          assert.commandWorked(coll.insert({arr: [{a: 99}], c: 99, d: 99}));
      }
      
      // Gather explain output for inclusion in the error message when the test fails.
      let explain = coll.find(predicate).explain("queryPlanner");
      const winningPlan = explain.queryPlanner.winningPlan.queryPlan;
      
      // This assertion fails!
      assert.eq(coll.find(predicate).itcount(), 1, winningPlan);
      }());
      

      The problem relates to the so-called "contained $or pushdown" rewrite. Specifically, we have logic to rewrite a predicate like so:

      a AND (b OR c) -> (a AND b) OR (a AND c)
      

      We have a version of this rewrite that can lift a predicate out of an $elemMatch object before pushing it into the OR. That looks something like this:

      ELEM_MATCH_OBJECT(x AND y) AND (b OR c) -> ELEM_MATCH_OBJECT(x AND y) AND ((x AND b) OR (x AND c) -> 
      

      Doing this "lifting" is not correct if the predicate is a negation such as $ne given the array traversal semantics described above! In the repro script the example query is:

      {arr: {$elemMatch: {a: {$ne: 1}}}, $or: [{c: 4, d: 5}, {c: 6, d: 7}]};
      

      When the script fails it dumps the plan:

      uncaught exception: Error: [0] != [1] are not equal : {
      	"stage" : "FETCH",
      	"planNodeId" : 6,
      	"filter" : {
      		"arr" : {
      			"$elemMatch" : {
      				"a" : {
      					"$not" : {
      						"$eq" : 1
      					}
      				}
      			}
      		}
      	},
      	"inputStage" : {
      		"stage" : "OR",
      		"planNodeId" : 5,
      		"inputStages" : [
      			{
      				"stage" : "FETCH",
      				"planNodeId" : 2,
      				"filter" : {
      					"arr.a" : {
      						"$not" : {
      							"$eq" : 1
      						}
      					}
      				},
      				"inputStage" : {
      					"stage" : "IXSCAN",
      					"planNodeId" : 1,
      					"keyPattern" : {
      						"arr.a" : 1,
      						"c" : 1,
      						"d" : 1
      					},
      					"indexName" : "arr.a_1_c_1_d_1",
      					"isMultiKey" : true,
      					"multiKeyPaths" : {
      						"arr.a" : [
      							"arr"
      						],
      						"c" : [ ],
      						"d" : [ ]
      					},
      					"isUnique" : false,
      					"isSparse" : false,
      					"isPartial" : false,
      					"indexVersion" : 2,
      					"direction" : "forward",
      					"indexBounds" : {
      						"arr.a" : [
      							"[MinKey, 1.0)",
      							"(1.0, MaxKey]"
      						],
      						"c" : [
      							"[4.0, 4.0]"
      						],
      						"d" : [
      							"[5.0, 5.0]"
      						]
      					}
      				}
      			},
      			{
      				"stage" : "FETCH",
      				"planNodeId" : 4,
      				"filter" : {
      					"arr.a" : {
      						"$not" : {
      							"$eq" : 1
      						}
      					}
      				},
      				"inputStage" : {
      					"stage" : "IXSCAN",
      					"planNodeId" : 3,
      					"keyPattern" : {
      						"arr.a" : 1,
      						"c" : 1,
      						"d" : 1
      					},
      					"indexName" : "arr.a_1_c_1_d_1",
      					"isMultiKey" : true,
      					"multiKeyPaths" : {
      						"arr.a" : [
      							"arr"
      						],
      						"c" : [ ],
      						"d" : [ ]
      					},
      					"isUnique" : false,
      					"isSparse" : false,
      					"isPartial" : false,
      					"indexVersion" : 2,
      					"direction" : "forward",
      					"indexBounds" : {
      						"arr.a" : [
      							"[MinKey, 1.0)",
      							"(1.0, MaxKey]"
      						],
      						"c" : [
      							"[6.0, 6.0]"
      						],
      						"d" : [
      							"[7.0, 7.0]"
      						]
      					}
      				}
      			}
      		]
      	}
      } :
      

      Let's focus our attention on the first child of the OR stage:

      {
      				"stage" : "FETCH",
      				"planNodeId" : 2,
      				"filter" : {
                                               // INCORRECT! This $ne has been lifted out of an $elemMatch.
      					"arr.a" : {
      						"$not" : {
      							"$eq" : 1
      						}
      					}
      				},
      				"inputStage" : {
      					"stage" : "IXSCAN",
      					"planNodeId" : 1,
      					"keyPattern" : {
      						"arr.a" : 1,
      						"c" : 1,
      						"d" : 1
      					},
      					"indexName" : "arr.a_1_c_1_d_1",
      					"isMultiKey" : true,
      					"multiKeyPaths" : {
      						"arr.a" : [
      							"arr"
      						],
      						"c" : [ ],
      						"d" : [ ]
      					},
      					"isUnique" : false,
      					"isSparse" : false,
      					"isPartial" : false,
      					"indexVersion" : 2,
      					"direction" : "forward",
      					"indexBounds" : {
      						"arr.a" : [
      							"[MinKey, 1.0)",
      							"(1.0, MaxKey]"
      						],
      						"c" : [
      							"[4.0, 4.0]"
      						],
      						"d" : [
      							"[5.0, 5.0]"
      						]
      					}
      				}
      			},
      

      You'll notice that the filter associated with the FETCH stage is the $ne predicate that was lifted outside the $elemMatch object. That is an incorrect transformation of the input query which can cause the plan to miss query results.

      A final note: The repro script I've provided actually exposes two separate bugs related to contained $or pushdown and $elemMatch as of this writing. The first is related ticket SERVER-74954. I've generated the example explain output above from a build of the server that contains a work-in-progress fix for SERVER-74954.

            Assignee:
            svilen.mihaylov@mongodb.com Svilen Mihaylov (Inactive)
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: