Multikey transition race during CBR sampling results in tassert

    • Query Optimization
    • ALL
    • Hide

      Run the following jstest in the no_passthrough suite. This reproduces the race and the resulting tassert reliably.

      /**
       * Demonstrates a race condition between CBR sampling and a concurrent write that transitions an
       * index from non-multikey to multikey.
       *
       * The race:
       *  1. QueryPlanner::plan() reads the index as non-multikey; IndexScanNode::index.multikey = false.
       *  2. generateSample() starts a sequential collection scan.
       *  3. The scan yields: abandonSnapshot() releases the storage snapshot.
       *  4. A concurrent write inserts {a: [1, 2, 3]}, making the compound index multikey on path 'a'.
       *  5. The scan resumes on a newer snapshot and includes the array document in _sample.
       *  6. estimateIndexSeeks() dispatches to estimateNDV() (non-multikey path) because
       *     node->index.multikey is still false from step 1.
       *  7. countNDV() applies NonArrayProjector to each sample document.
       *  8. NonArrayProjector encounters the array and tasserts (11158502), crashing the server.
       *
       * @tags: [requires_fcv_83]
       */
      
      import {configureFailPoint} from "jstests/libs/fail_point_util.js";
      
      const conn = MongoRunner.runMongod({
          setParameter: {
              internalQueryFrameworkControl: "forceClassicEngine",
              featureFlagCostBasedRanker: true,
              internalQueryCBRCEMode: "samplingCE",
              // Sequential scan gives a deterministic, repeatable sample order and makes it easy
              // to ensure the yield window is wide enough to include the new array document.
              internalQuerySamplingBySequentialScan: true,
              // Yield on every document so the scan reliably yields inside the sampling executor.
              internalQueryExecYieldIterations: 1,
              internalQueryExecYieldPeriodMS: 0,
          },
      });
      assert.neq(null, conn, "mongod failed to start");
      
      const adminDB = conn.getDB("admin");
      const testDB = conn.getDB(jsTestName());
      const coll = testDB[jsTestName()];
      coll.drop();
      
      // Insert documents with scalar values so the index is initially non-multikey.
      const kNumDocs = 200;
      const docs = [];
      for (let i = 0; i < kNumDocs; i++) {
          docs.push({a: i, b: i, c: i});
      }
      assert.commandWorked(coll.insertMany(docs));
      assert.commandWorked(coll.createIndexes([{a: 1, b: 1, c: 1}]));
      
      // Step 1: Pause just before generateSample() so that we can arm setYieldAllLocksHang after
      // QueryPlanner::plan() has already built the IndexScanNode (node->index.multikey = false).
      // explain() is used so CBR ranks even a single-solution query (canSkipRanking requires !isExplain).
      const fpBeforeSampling = configureFailPoint(adminDB, "hangBeforeCBRSamplingGenerateSample");
      
      const awaitQuery = startParallelShell(
          `const testColl = db.getSiblingDB("${jsTestName()}")["${jsTestName()}"];
           testColl.find({a: {$lt: 50}, c: {$lt: 50}}).explain().finish();`,
          conn.port,
      );
      
      // Step 2: Wait for planning to complete (multikey = false now latched in IndexScanNode).
      fpBeforeSampling.wait();
      
      // Step 3: Arm the yield failpoint so we catch the first yield inside the sampling scan.
      // After this yield, abandonSnapshot() will have released the storage snapshot.
      const fpYield = configureFailPoint(adminDB, "setYieldAllLocksHang", {
          namespace: coll.getFullName(),
      });
      fpBeforeSampling.off();
      
      // Step 4: Wait for the sampling scan to yield.
      fpYield.wait();
      
      // Step 5: Insert an array-valued document, making the compound index multikey on path 'a'.
      // The insert commits before the scan resumes, so the new snapshot will include this document.
      assert.commandWorked(coll.insertOne({a: [1, 2, 3], b: 1}));
      
      // Step 6: Release the yield. The scan resumes on a new snapshot that includes {a:[1,2,3]}.
      // estimateNDV() will then receive this document via NonArrayProjector and tassert (11158502).
      // After a fix, the query should complete normally and awaitQuery() should return without error.
      fpYield.off();
      
      awaitQuery();
      
      MongoRunner.stopMongod(conn);
      
       
      Show
      Run the following jstest in the no_passthrough suite. This reproduces the race and the resulting tassert reliably. /** * Demonstrates a race condition between CBR sampling and a concurrent write that transitions an * index from non-multikey to multikey. * * The race: * 1. QueryPlanner::plan() reads the index as non-multikey; IndexScanNode::index.multikey = false . * 2. generateSample() starts a sequential collection scan. * 3. The scan yields: abandonSnapshot() releases the storage snapshot. * 4. A concurrent write inserts {a: [1, 2, 3]}, making the compound index multikey on path 'a' . * 5. The scan resumes on a newer snapshot and includes the array document in _sample. * 6. estimateIndexSeeks() dispatches to estimateNDV() (non-multikey path) because * node->index.multikey is still false from step 1. * 7. countNDV() applies NonArrayProjector to each sample document. * 8. NonArrayProjector encounters the array and tasserts (11158502), crashing the server. * * @tags: [requires_fcv_83] */ import {configureFailPoint} from "jstests/libs/fail_point_util.js" ; const conn = MongoRunner.runMongod({ setParameter: { internalQueryFrameworkControl: "forceClassicEngine" , featureFlagCostBasedRanker: true , internalQueryCBRCEMode: "samplingCE" , // Sequential scan gives a deterministic, repeatable sample order and makes it easy // to ensure the yield window is wide enough to include the new array document. internalQuerySamplingBySequentialScan: true , // Yield on every document so the scan reliably yields inside the sampling executor. internalQueryExecYieldIterations: 1, internalQueryExecYieldPeriodMS: 0, }, }); assert.neq( null , conn, "mongod failed to start" ); const adminDB = conn.getDB( "admin" ); const testDB = conn.getDB(jsTestName()); const coll = testDB[jsTestName()]; coll.drop(); // Insert documents with scalar values so the index is initially non-multikey. const kNumDocs = 200; const docs = []; for (let i = 0; i < kNumDocs; i++) { docs.push({a: i, b: i, c: i}); } assert.commandWorked(coll.insertMany(docs)); assert.commandWorked(coll.createIndexes([{a: 1, b: 1, c: 1}])); // Step 1: Pause just before generateSample() so that we can arm setYieldAllLocksHang after // QueryPlanner::plan() has already built the IndexScanNode (node->index.multikey = false ). // explain() is used so CBR ranks even a single-solution query (canSkipRanking requires !isExplain). const fpBeforeSampling = configureFailPoint(adminDB, "hangBeforeCBRSamplingGenerateSample" ); const awaitQuery = startParallelShell( ` const testColl = db.getSiblingDB( "${jsTestName()}" )[ "${jsTestName()}" ]; testColl.find({a: {$lt: 50}, c: {$lt: 50}}).explain().finish();`, conn.port, ); // Step 2: Wait for planning to complete (multikey = false now latched in IndexScanNode). fpBeforeSampling.wait(); // Step 3: Arm the yield failpoint so we catch the first yield inside the sampling scan. // After this yield, abandonSnapshot() will have released the storage snapshot. const fpYield = configureFailPoint(adminDB, "setYieldAllLocksHang" , { namespace: coll.getFullName(), }); fpBeforeSampling.off(); // Step 4: Wait for the sampling scan to yield. fpYield.wait(); // Step 5: Insert an array-valued document, making the compound index multikey on path 'a' . // The insert commits before the scan resumes, so the new snapshot will include this document. assert.commandWorked(coll.insertOne({a: [1, 2, 3], b: 1})); // Step 6: Release the yield. The scan resumes on a new snapshot that includes {a:[1,2,3]}. // estimateNDV() will then receive this document via NonArrayProjector and tassert (11158502). // After a fix, the query should complete normally and awaitQuery() should return without error. fpYield.off(); awaitQuery(); MongoRunner.stopMongod(conn);
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      samplingCE's sample generation can yield, which results in the following race:

      • a collection has an index that is initially not multikey
      • the query planner gets called on it. it reads the catalog, and fetches the index as not multikey
      • after that, but before sampling ce generates the samples, a document is inserted to the collection that makes the index mulktikey
      • during index seek estimation, CardinalityEstimator calls the non-multikey version of estimateNDV
      • NonArrayProjector tassert-s upon seeing the array

            Assignee:
            Unassigned
            Reporter:
            Kartal Kaan Bozdogan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: