Cache entries that use partial indexes in one branch of an OR can return wrong results

XMLWordPrintableJSON

    • Query Optimization
    • ALL
    • Hide
      const control_coll = db.control;
      const coll = db.experiment;
      
      assert(control_coll.drop());
      assert(coll.drop());
      
      const doc = [{_id: 1, m: 0, a: 0, b: 0}];
      
      assert.commandWorked(control_coll.insert(doc));
      assert.commandWorked(coll.insert(doc));
      
      const partialFilter = {
          $or: [{a: {$lt: ""}}, {_id: {$eq: 0}, a: {$eq: 0}}]
      };
      
      assert.commandWorked(coll.createIndex({a: 1}, {partialFilterExpression: partialFilter}));
      assert.commandWorked(coll.createIndex({a: 1, m: 1}, {partialFilterExpression: partialFilter}));
      assert.commandWorked(coll.createIndex({b: 1, a: 1}, {partialFilterExpression: partialFilter}));
      
      const q1 = [
          {$match: {$or: [{a: {$lt: ""}}, {_id: {$eq: 0}, a: {$eq: -1}}]}},
          {$sort: {b: 1}}
      ];
      const q2 = [
          {$match: {$or: [{a: {$lt: 1}}, {_id: {$eq: 0}, a: {$eq: 0}}]}},
          {$sort: {b: 1}}
      ];
      
      // Create a cache entry
      coll.aggregate(q1).toArray();
      coll.aggregate(q1).toArray();
      coll.aggregate(q1).toArray();
      
      const r1 = control_coll.aggregate(q2).toArray();
      const r2 = coll.aggregate(q2).toArray();
      assert.eq(r1.length, r2.length)
      // Missing document in the experiment collection
      
      jsTestLog(coll.explain().aggregate(q2))
      // Explain shows a collection scan, but in the logs the cache entry (not a collscan) is used.
      
      Show
      const control_coll = db.control; const coll = db.experiment; assert (control_coll.drop()); assert (coll.drop()); const doc = [{_id: 1, m: 0, a: 0, b: 0}]; assert .commandWorked(control_coll.insert(doc)); assert .commandWorked(coll.insert(doc)); const partialFilter = { $or: [{a: {$lt: ""}}, {_id: {$eq: 0}, a: {$eq: 0}}] }; assert .commandWorked(coll.createIndex({a: 1}, {partialFilterExpression: partialFilter})); assert .commandWorked(coll.createIndex({a: 1, m: 1}, {partialFilterExpression: partialFilter})); assert .commandWorked(coll.createIndex({b: 1, a: 1}, {partialFilterExpression: partialFilter})); const q1 = [ {$match: {$or: [{a: {$lt: ""}}, {_id: {$eq: 0}, a: {$eq: -1}}]}}, {$sort: {b: 1}} ]; const q2 = [ {$match: {$or: [{a: {$lt: 1}}, {_id: {$eq: 0}, a: {$eq: 0}}]}}, {$sort: {b: 1}} ]; // Create a cache entry coll.aggregate(q1).toArray(); coll.aggregate(q1).toArray(); coll.aggregate(q1).toArray(); const r1 = control_coll.aggregate(q2).toArray(); const r2 = coll.aggregate(q2).toArray(); assert .eq(r1.length, r2.length) // Missing document in the experiment collection jsTestLog(coll.explain().aggregate(q2)) // Explain shows a collection scan, but in the logs the cache entry (not a collscan) is used.
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None

      This is very similar to SERVER-102825, but I discussed with ben.shteinfeld@mongodb.com and we decided to file a new ticket since the root cause seems different. The issue from SERVER-102825 was fixed and the bug doesn't repro anymore.

      In SERVER-102825, the partial index filter and query have an $or, but the resulting plan was just a single index scan + fetch, not the union of two index scans.

      In this repro, the partial index filter and query also have an $or, but there is an index union. One subplan uses the _id index and the other subplan uses the "a" index with the partial filter. Then when a query with the same shape but different constants that don't satisfy the partial filter is run, we use the cache entry anyway.

            Assignee:
            Peter Volk
            Reporter:
            Matt Boros
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: