Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40319

ClusterCursorManager::killCursorsWithMatchingSessions() can call get() on an uninitialized boost::optional

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.1.10
    • Affects Version/s: None
    • Component/s: Querying, Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      I've created a patch which instruments the server code with a sleep in order to reproduce this more reliably. Apply the following patch:

      diff --git a/src/mongo/db/kill_sessions_common.h b/src/mongo/db/kill_sessions_common.h
      index 95361aa5cf..7b24b579c8 100644
      --- a/src/mongo/db/kill_sessions_common.h
      +++ b/src/mongo/db/kill_sessions_common.h
      @@ -101,6 +101,9 @@ public:
                       ScopedKillAllSessionsByPatternImpersonator impersonator(_opCtx, *pattern);
      
                       auto cursors = mgr.getCursorsForSession(session);
      +                std::cout << "!!! sleeping after getting cursors for session" << std::endl;
      +                sleepFor(Seconds{10});
      +                std::cout << "!!! waking up" << std::endl;
                       for (const auto& id : cursors) {
                           try {
                               _eraser(mgr, id);
      

      Then, start a sharded cluster. Run the following against the mongos from one shell. This opens a logical session and creates a cursor within that session:

      var session = db.getMongo().startSession();
      var sessionDb = session.getDatabase(db.getName());
      sessionDb.c.insert({a: 1});
      sessionDb.c.insert({a: 1});
      sessionDb.c.insert({a: 1});
      var cursor = sessionDb.c.find().batchSize(2);
      cursor.next();
      

      From another shell, verify that there is an idle cursor, and obtain the lsid associated with the cursor:

      use admin
      db.aggregate([{$currentOp: {idleCursors: true}}, {$match: {type: "idleCursor"}}]);
      

      Then, from the second shell, issue the admin command to kill the session. This will hang for 10 seconds due to the instrumentation above.

       db.adminCommand({killSessions: [ { "id": <lsid> } ]});
      

      While the killSessions command is hanging, close the cursor in the first shell:

      cursor.close();
      

      In a debug build, this will trip an assertion in the boost::optional code checking that the caller does not attempt to call get() when the value of the optional is boost::none:

      s20005| mongos: src/third_party/boost-1.69.0/boost/optional/optional.hpp:1207: boost::optional::reference_type boost::optional<mongo::NamespaceString>::get() [T = mongo::NamespaceString]: Assertion `this->is_initialized()' failed.
      
      Show
      I've created a patch which instruments the server code with a sleep in order to reproduce this more reliably. Apply the following patch: diff --git a/src/mongo/db/kill_sessions_common.h b/src/mongo/db/kill_sessions_common.h index 95361aa5cf..7b24b579c8 100644 --- a/src/mongo/db/kill_sessions_common.h +++ b/src/mongo/db/kill_sessions_common.h @@ -101,6 +101,9 @@ public: ScopedKillAllSessionsByPatternImpersonator impersonator(_opCtx, *pattern); auto cursors = mgr.getCursorsForSession(session); + std::cout << "!!! sleeping after getting cursors for session" << std::endl; + sleepFor(Seconds{10}); + std::cout << "!!! waking up" << std::endl; for (const auto& id : cursors) { try { _eraser(mgr, id); Then, start a sharded cluster. Run the following against the mongos from one shell. This opens a logical session and creates a cursor within that session: var session = db.getMongo().startSession(); var sessionDb = session.getDatabase(db.getName()); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); var cursor = sessionDb.c.find().batchSize(2); cursor.next(); From another shell, verify that there is an idle cursor, and obtain the lsid associated with the cursor: use admin db.aggregate([{$currentOp: {idleCursors: true }}, {$match: {type: "idleCursor" }}]); Then, from the second shell, issue the admin command to kill the session. This will hang for 10 seconds due to the instrumentation above. db.adminCommand({killSessions: [ { "id" : <lsid> } ]}); While the killSessions command is hanging, close the cursor in the first shell: cursor.close(); In a debug build, this will trip an assertion in the boost::optional code checking that the caller does not attempt to call get() when the value of the optional is boost::none: s20005| mongos: src/third_party/boost-1.69.0/boost/optional/optional.hpp:1207: boost::optional::reference_type boost::optional<mongo::NamespaceString>::get() [T = mongo::NamespaceString]: Assertion `this->is_initialized()' failed.
    • Query 2019-04-08
    • 58

      ClusterCursorManager::killCursorsWithMatchingSessions() creates a lambda which makes a call to ClusterCursorManager::getNamespaceForCursorId(). This function returns a boost::optional<NamespaceString> which may be boost::none. However, the lambda calls get() on it unconditionally:

      https://github.com/mongodb/mongo/blob/60c0441f9ba3196eaabe38935333d17f1aff88f8/src/mongo/s/query/cluster_cursor_manager.cpp#L677

      In debug builds, this trips a process-fatal assertion. In non-debug builds it leads to undefined behavior.

      This situation can occur because the killCursorsWithMatchingSessions() makes several different calls into the mongos cursor manager, which acquire the cursor manager mutex independently. First, it uses getCursorsForSession() to gather up the cursors associated with a logical session. Next, after dropping and requiring the mutex, it calls getNamespaceForCursorId() in the process of killing these cursors. If in between these two steps, all of the cursors for one of the relevant namespaces die, then getNamespaceForCursorId() will no longer be tracking the namespace and will return boost::none. In the repro steps, this is achieved by issuing a killCursors command concurrently with a killSessions command.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: