Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84338

Top level $or queries may lead to invalid SBE plan cache entry which returns wrong results

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 7.2.1, 7.3.0-rc0, 7.0.6
    • Affects Version/s: 7.1.0, 7.0.4, 7.3.0-rc0, 7.2.0
    • Component/s: None
    • Query Optimization
    • Fully Compatible
    • ALL
    • v7.2, v7.0
    • Hide

      // This is the original reproduction steps provided by the customer.

      import { MongoClient, UUID } from "mongodb";
      const client = await MongoClient.connect("mongodb://localhost:27017/");
      const db = client.db("test");
      const coll = db.collection("test");for (let i = 0; i < 10; i++) {
        const _id = new UUID();
        const foo = new UUID();
        await coll.insertOne({ _id, foo });
        const query = { _id, foo };
        const singleResult = await coll.findOne(query);
      {{  const orResult = await coll.findOne(

      { $or: [query, query] }

      );}}
        const broken = !singleResult || !orResult;
        console.log(
          `${i
            .toFixed()
            .padStart(3)} single=${!Unable to render embedded object: File (singleResult} or=${) not found.!orResult} broken=${broken}`
        );
      }
      await db.dropDatabase();
      await client.close();

      -------------------------------------------

      With MongoDB 7.04

      node .\index.mjs
        0 single=true or=true broken=false
        1 single=true or=false broken=true
        2 single=true or=false broken=true
        3 single=true or=false broken=true
        4 single=true or=false broken=true
        5 single=true or=false broken=true
        6 single=true or=false broken=true
        7 single=true or=false broken=true
        8 single=true or=false broken=true
        9 single=true or=false broken=true

      -------------------------------------------

      With MongoDB 6.0.12

       node .\index.mjs
        0 single=true or=true broken=false
        1 single=true or=true broken=false
        2 single=true or=true broken=false
        3 single=true or=true broken=false
        4 single=true or=true broken=false
        5 single=true or=true broken=false
        6 single=true or=true broken=false
        7 single=true or=true broken=false
        8 single=true or=true broken=false
        9 single=true or=true broken=false

      Show
      // This is the original reproduction steps provided by the customer. import { MongoClient, UUID } from "mongodb"; const client = await MongoClient.connect("mongodb://localhost:27017/"); const db = client.db("test"); const coll = db.collection("test"); for (let i = 0; i < 10; i++) {   const _id = new UUID();   const foo = new UUID();   await coll.insertOne({ _id, foo });   const query = { _id, foo };   const singleResult = await coll.findOne(query); {{  const orResult = await coll.findOne( { $or: [query, query] } );}}   const broken = !singleResult || !orResult;   console.log(     `${i       .toFixed()       .padStart(3)} single=${! Unable to render embedded object: File (singleResult} or=${) not found. !orResult} broken=${broken}`   ); } await db.dropDatabase(); await client.close(); ------------------------------------------- With MongoDB 7.04 node .\index.mjs   0 single=true or=true broken=false   1 single=true or=false broken=true   2 single=true or=false broken=true   3 single=true or=false broken=true   4 single=true or=false broken=true   5 single=true or=false broken=true   6 single=true or=false broken=true   7 single=true or=false broken=true   8 single=true or=false broken=true   9 single=true or=false broken=true ------------------------------------------- With MongoDB 6.0.12  node .\index.mjs   0 single=true or=true broken=false   1 single=true or=true broken=false   2 single=true or=true broken=false   3 single=true or=true broken=false   4 single=true or=true broken=false   5 single=true or=true broken=false   6 single=true or=true broken=false   7 single=true or=true broken=false   8 single=true or=true broken=false   9 single=true or=true broken=false
    • QE 2023-12-25, QO 2024-01-08, QO 2024-01-22
    • 162

      Observable Problem

      When running a query/aggregation with a $or, the SBE plan cache entry may store an invalid plan. This results in subsequent executions of the query, with different constants, returning incorrect results. In particular, the cached plan may incorrectly filter out documents which should be returned. This problem is specific to SBE and the SBE plan cache. The classic engine is not affected.

      Under 7.0.4, 7.1,7.2, this bug affects find() commands, since SBE is on by default.

      In the v7.0 branch and master branch, this bug affects only aggregations, since SBE use is restricted to a smaller set of queries (see SERVER-83685).

      Reproduction with resmoke

      To reproduce the bug, see SERVER-84338-repro-2.js and run it with resmoke:

      python3 buildscripts/resmoke.py run --suites=no_passthrough_with_mongod SERVER-84338-repro-2.js

      The script will run several tests against mongod, reconfiguring its internalQueryFrameworkControl value to trySbeEngine, trySbeRestricted and forceClassicEngine. At the end, it will print out a list of queries and configurations that produced incorrect results.

      Under default configuration, which is trySbeRestricted, both 7.0 and master can produce incorrect results for an aggregation of this form:

      [
          {
              "$match" : {
                  "$or" : [
                      {
                          "_id" : 2,
                          "foo" : 2
                      },
                      {
                          "_id" : 2,
                          "foo" : 999
                      }
                  ]
              }
          },
          {
              "$count" : "count"
          }
      ]

       

      Explanation of the bug

      The bug has to do with the interaction between top-level $ors, the $or -> $in rewrite, and the plan cache.

      A query with the filter of the form {$or: {_id: 1, foo: 1}, {_id: 1, foo:999}} comes in, and is optimized/normalized, and parameterized as usual. Optimizing this expression does not change the expression (the _id predicate is not pulled up).

      Each branch of the OR is then planned separately, using an index scan on _id and a residual filter on foo.

      The planner then realizes that the scans generated beneath the OR are identical, and collapses them into one in QueryPlannerAccess::collapseEquivalentScans().

      When collapsing the scans, we combine the residual filters on foo into

      { $or: [ { foo: { $eq: 999.0 } }, { foo: { $eq: 1.0 } } ] } 

      We then optimize this expression here which converts the $or to a $in here. The optimized residual predicate is:

      {foo: {$in: [1, 999]}} 

      The resulting $in does not preserve the parameter IDs assigned to each predicate on foo. This is the crux of the problem: that parameterization information has effectively been "lost."

      At this point, the $in right hand side is effectively "baked" into the query plan which is cached. Subsequent executions will use 1, 999 in the residual predicate even if those are not the values provided.

      The attached SERVER-84338-repro-2.js contains more details, as well as some information about queries which are similar but do not exhibit the bug.

      Next Steps

      The remaining tasks here are:

      (a) Determine the expected/desired interaction between match expression optimization and parameterization. This was probably discussed during the plan cache development. The repro script includes a similar situation where the bug does not reproduce, but it is not clear whether this is intentional, or whether it is just by chance.

      (b) Fix the bug.

      (c) Determine a testing strategy so we can catch this type of bug on our own.

      (d) Investigate whether there are other cases where we can store an invalid plan in the cache. Any time we optimize a match expression after parameterization, we are at risk of this bug occurring, since the optimization can effectively "lose" parameter markings. I did a quick check of the planning code and couldn't find other places where we call MatchExpression::optimize() besides the on linked above when collapsing scans beneath an OR.

      Original Description Provided by Customer

      Given a query, `findOne({$or: [query, query]})` does not yield the same results as `findOne(query)`.

      It works fine in 6.0

      Description of Attached Files

      index.mjs: Original reproduction provided by customer

      SERVER-84338-repro.js: Reproduction script from Kyle Suarez, which runs the server under several configurations.

      SERVER-84338-repro-2.js: Final repro script from Ian B, which supersedes the prior two. This file includes detailed notes on the bug, along with some different queries which do not exhibit the bug. This file, along with the resmoke command in the description, should be all that is needed to reproduce and understand the bug.

        1. index.mjs
          0.7 kB
          Mika Fischer
        2. SERVER-84338-repro.js
          2 kB
          Kyle Suarez
        3. SERVER-84338-repro-2.js
          6 kB
          Ian Boros

            Assignee:
            ben.shteinfeld@mongodb.com Ben Shteinfeld
            Reporter:
            mika.fischer@zoopnet.de Mika Fischer
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated:
              Resolved: