Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43202

Aggregation system can continue trying to execute a query plan after being interrupted, leading to server-fatal invariant failure

    • Fully Compatible
    • ALL
    • v4.2
    • Hide

      I was able to reproduce this failure by first instrumenting the code with the following patch. The patch adds a bit of logging, but most importantly causes every query yield point to result in a ClientDisconnect error:

      diff --git a/src/mongo/db/pipeline/pipeline_d.cpp b/src/mongo/db/pipeline/pipeline_d.cpp
      index 12c95222fe..0b77a6be1d 100644
      --- a/src/mongo/db/pipeline/pipeline_d.cpp
      +++ b/src/mongo/db/pipeline/pipeline_d.cpp
      @@ -695,6 +695,7 @@ StatusWith<std::unique_ptr<PlanExecutor, PlanExecutor::Deleter>> PipelineD::prep
           }
      
           if (sortStage) {
      +        std::cout << "[dstorch] attempting to get a non-blocking sort plan" << std::endl;
               // See if the query system can provide a non-blocking sort.
               auto swExecutorSort =
                   attemptToGetExecutor(opCtx,
      @@ -708,6 +709,10 @@ StatusWith<std::unique_ptr<PlanExecutor, PlanExecutor::Deleter>> PipelineD::prep
                                        aggRequest,
                                        plannerOpts,
                                        matcherFeatures);
      +        if (!swExecutorSort.isOK()) {
      +            std::cout << "[dstorch] failed to get a non-blocking sort plan with code: "
      +                      << swExecutorSort.getStatus() << std::endl;
      +        }
      
               if (swExecutorSort.isOK()) {
                   // Success! Now see if the query system can also cover the projection.
      diff --git a/src/mongo/db/query/plan_yield_policy.cpp b/src/mongo/db/query/plan_yield_policy.cpp
      index 43590f27d7..f8a945958b 100644
      --- a/src/mongo/db/query/plan_yield_policy.cpp
      +++ b/src/mongo/db/query/plan_yield_policy.cpp
      @@ -84,6 +84,7 @@ void PlanYieldPolicy::resetTimer() {
       Status PlanYieldPolicy::yieldOrInterrupt(std::function<void()> whileYieldingFn) {
           invariant(_planYielding);
      
      +
           if (_policy == PlanExecutor::INTERRUPT_ONLY) {
               ON_BLOCK_EXIT([this]() { resetTimer(); });
               OperationContext* opCtx = _planYielding->getOpCtx();
      @@ -168,6 +169,8 @@ void PlanYieldPolicy::_yieldAllLocks(OperationContext* opCtx,
           // Attempt to check for interrupt while locks are not held, in order to discourage the
           // assumption that locks will always be held when a Plan Executor returns an error.
           if (_policy == PlanExecutor::YIELD_AUTO) {
      +        std::cout << "[dstorch] marking operation killed on client disconnect" << std::endl;
      +        opCtx->markKilled(ErrorCodes::ClientDisconnect);
               opCtx->checkForInterrupt();  // throws
           }
      

      Next, I ran the following script against a build of the instrumented server. This configures the server to yield frequently, and then runs an aggregate command which exercises the flawed code path.

      (function() {
      "use strict";
      
      db.c.drop();
      assert.commandWorked(db.c.insert({a: 1, b: 1, c: 1}));
      assert.commandWorked(db.c.insert({a: 1, b: 1, c: 1}));
      
      assert.commandWorked(db.c.createIndex({a: 1, b: 1}));
      assert.commandWorked(db.c.createIndex({a: 1, c: 1}));
      
      // This is needed to make sure that a yield point is reached.
      assert.commandWorked(db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 1}));
      
      // This command triggers the crash.
      db.c.aggregate([{$match: {a: 1, b: 1, c: 1}}, {$sort: {a: 1}}]).itcount();
      }());
      
      Show
      I was able to reproduce this failure by first instrumenting the code with the following patch. The patch adds a bit of logging, but most importantly causes every query yield point to result in a ClientDisconnect error: diff --git a/src/mongo/db/pipeline/pipeline_d.cpp b/src/mongo/db/pipeline/pipeline_d.cpp index 12c95222fe..0b77a6be1d 100644 --- a/src/mongo/db/pipeline/pipeline_d.cpp +++ b/src/mongo/db/pipeline/pipeline_d.cpp @@ -695,6 +695,7 @@ StatusWith<std::unique_ptr<PlanExecutor, PlanExecutor::Deleter>> PipelineD::prep } if (sortStage) { + std::cout << "[dstorch] attempting to get a non-blocking sort plan" << std::endl; // See if the query system can provide a non-blocking sort. auto swExecutorSort = attemptToGetExecutor(opCtx, @@ -708,6 +709,10 @@ StatusWith<std::unique_ptr<PlanExecutor, PlanExecutor::Deleter>> PipelineD::prep aggRequest, plannerOpts, matcherFeatures); + if (!swExecutorSort.isOK()) { + std::cout << "[dstorch] failed to get a non-blocking sort plan with code: " + << swExecutorSort.getStatus() << std::endl; + } if (swExecutorSort.isOK()) { // Success! Now see if the query system can also cover the projection. diff --git a/src/mongo/db/query/plan_yield_policy.cpp b/src/mongo/db/query/plan_yield_policy.cpp index 43590f27d7..f8a945958b 100644 --- a/src/mongo/db/query/plan_yield_policy.cpp +++ b/src/mongo/db/query/plan_yield_policy.cpp @@ -84,6 +84,7 @@ void PlanYieldPolicy::resetTimer() { Status PlanYieldPolicy::yieldOrInterrupt(std::function<void()> whileYieldingFn) { invariant(_planYielding); + if (_policy == PlanExecutor::INTERRUPT_ONLY) { ON_BLOCK_EXIT([ this ]() { resetTimer(); }); OperationContext* opCtx = _planYielding->getOpCtx(); @@ -168,6 +169,8 @@ void PlanYieldPolicy::_yieldAllLocks(OperationContext* opCtx, // Attempt to check for interrupt while locks are not held, in order to discourage the // assumption that locks will always be held when a Plan Executor returns an error. if (_policy == PlanExecutor::YIELD_AUTO) { + std::cout << "[dstorch] marking operation killed on client disconnect" << std::endl; + opCtx->markKilled(ErrorCodes::ClientDisconnect); opCtx->checkForInterrupt(); // throws } Next, I ran the following script against a build of the instrumented server. This configures the server to yield frequently, and then runs an aggregate command which exercises the flawed code path. (function() { "use strict" ; db.c.drop(); assert .commandWorked(db.c.insert({a: 1, b: 1, c: 1})); assert .commandWorked(db.c.insert({a: 1, b: 1, c: 1})); assert .commandWorked(db.c.createIndex({a: 1, b: 1})); assert .commandWorked(db.c.createIndex({a: 1, c: 1})); // This is needed to make sure that a yield point is reached. assert .commandWorked(db.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 1})); // This command triggers the crash. db.c.aggregate([{$match: {a: 1, b: 1, c: 1}}, {$sort: {a: 1}}]).itcount(); }());
    • Query 2019-09-09, Query 2019-09-23, Query 2019-10-07
    • 11

      The aggregation subsystem may invoke the underlying multi-planning system multiple times in order to select its execution plan. For example, when there is a $sort it first attempts to construct a non-blocking $sort plan. If no such plan exists, it then invokes the plan selection code again without pushing down the $sort to the underlying PlanStage layer. See the code implementing this logic here.

      If during this process the operation is killed—because the client disconnected, or due to an explicit killOp command, or due to a conflicting catalog event such as a collection drop—then this process should terminate at the next yield point and the operation should stop running. However, there is incorrect logic that accidentally swallows the interruption error and attempts to keep executing the aggregation operation. Specifically, we have checks to propagate the error code QueryPlanKilled which misses other possible error codes, e.g. ClientDisconnect.

      Since an interrupt check happens at query yield points, the interruption error is propagated after locks have been relinquished. The consequence is that we attempt to keep executing the aggregation operation without holding the proper locks. This promptly triggers the following server-fatal invariant check, which makes an assertion that the necessary locks are held:

      https://github.com/mongodb/mongo/blob/cdf0a059437f1ed417a22b3c574e2f73afc04233/src/mongo/db/catalog/database_holder_impl.cpp#L73-L74

      When this scenario occurs, the server will shut down and and the logs will contain the following message, along with a backtrace:

      2019-09-06T13:37:05.575-0400 F  -        [conn1] Invariant failure opCtx->lockState()->isDbLockedForMode(db, MODE_IS) || (db.compare("local") == 0 && opCtx->lockState()->isLocked()) src/mongo/db/catalog/database_holder_impl.cpp 74
      

            Assignee:
            ian.boros@mongodb.com Ian Boros
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: