Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15019

Killing agg executor doesn't kill underlying executor

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.7.4
    • Fix Version/s: 2.7.8
    • Component/s: Querying
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Query 2.7.8

      Description

      Running the dropIndexes command at the same time as an aggregate command can crash the server. Regression introduced in 2.7.4 by 7ffac7f3 (the logic in PipelineRunner::kill() was removed and not replaced with analogous PlanExecutor logic). Related to SERVER-14969. Quoting comment from that ticket:

      db.foo.drop();
      var x='';
      for (var i=0; i<1024*1024; i++) {
          x+='x';
      }
      for (var i=0; i<256; i++){
          db.foo.insert({x:x});
      }
      startParallelShell("for(;;) { db.foo.dropIndexes(); sleep(100); db.foo.ensureIndex({a:1}) }");
      for (;;) {
          try {
              db.foo.aggregate([{$match: {x: /y/, a: null}}]);
          }
          catch(e) { }
      }

      The sequence of events that causes the crash is: 1) the aggregation operation acquires a read lock, gets a PipelineRunner with a document source pipeline stage (which will use an index), and releases the read lock, 2) the dropIndexes operation acquires a write lock, drops the index, and releases the write lock, and 3) the aggregation acquires a read lock (in the document source stage pipeline), the document source pipeline stage attempts to read from the underlying runner, and the crash occurs (since the index has gone away).

      ...

      The issue is more complicated in 2.7.6-pre, however. The Runner abstraction has been removed and replaced with PlanExecutor, and the PlanExecutor stage tree is not notified of kill() operations. It is still the case that the "user-facing" cursor is registered with the collection cursor cache, but even if kill() is invoked on the associated PlanExecutor, the API doesn't allow for the kill to be propagated down to the underlying executor; kill() on a PlanExecutor merely sets the "_killed" flag. The underlying executor needs to be told about the kill, because the parent executor may be in the middle of a getNext() call when the invalidate happens (note that executors with a PipelineProxyStage root execute under no lock; the locking is performed by DocumentSourceCursor when interacting with the underlying executor).

      Here's a stack trace for the issue in 2.7.6-pre:

       mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x27) [0x1d4d869]
       mongod(mongo::logContext(char const*)+0x71) [0x1ce4480]
       mongod(mongo::invariantFailed(char const*, char const*, unsigned int)+0xB2) [0x1ccf821]
       mongod(mongo::IndexDescriptor::_checkOk() const+0x90) [0x1728fe6]
       mongod(mongo::IndexDescriptor::isMultikey() const+0x18) [0x14e3b20]
       mongod(mongo::IndexScan::initIndexScan()+0x4A) [0x16104ce]
       mongod(mongo::IndexScan::work(unsigned long*)+0xDC) [0x1610b2e]
       mongod(mongo::FetchStage::work(unsigned long*)+0xAA) [0x15fb70a]
       mongod(mongo::KeepMutationsStage::work(unsigned long*)+0x9A) [0x1612a06]
       mongod(mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)+0x59) [0x18d072b]
       mongod(mongo::DocumentSourceCursor::loadBatch()+0x365) [0x17bfd65]
       mongod(mongo::DocumentSourceCursor::getNext()+0x49) [0x17bf92d]
       mongod(mongo::PipelineProxyStage::getNextBson()+0x4E) [0x1635fce]
       mongod(mongo::PipelineProxyStage::work(unsigned long*)+0xFD) [0x1635b67]
       mongod(mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)+0x59) [0x18d072b]
       mongod(+0x115C3FE) [0x155c3fe]
       mongod(mongo::PipelineCommand::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x809) [0x155d9fb]
       mongod(mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x96) [0x15b2a6b]
       mongod(mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool)+0xBB8) [0x15b3a02]
       mongod(mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int)+0x4D1) [0x15b430a]

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              david.storch David Storch
              Reporter:
              rassi J Rassi
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: