Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.1.10
Affects Version/s: None
Component/s: Querying, Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:
Hide

I've created a patch which instruments the server code with a sleep in order to reproduce this more reliably. Apply the following patch:

diff --git a/src/mongo/db/kill_sessions_common.h b/src/mongo/db/kill_sessions_common.h index 95361aa5cf..7b24b579c8 100644 --- a/src/mongo/db/kill_sessions_common.h +++ b/src/mongo/db/kill_sessions_common.h @@ -101,6 +101,9 @@ public: ScopedKillAllSessionsByPatternImpersonator impersonator(_opCtx, *pattern); auto cursors = mgr.getCursorsForSession(session); + std::cout << "!!! sleeping after getting cursors for session" << std::endl; + sleepFor(Seconds{10}); + std::cout << "!!! waking up" << std::endl; for (const auto& id : cursors) { try { _eraser(mgr, id);

Then, start a sharded cluster. Run the following against the mongos from one shell. This opens a logical session and creates a cursor within that session:

var session = db.getMongo().startSession(); var sessionDb = session.getDatabase(db.getName()); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); var cursor = sessionDb.c.find().batchSize(2); cursor.next();

From another shell, verify that there is an idle cursor, and obtain the lsid associated with the cursor:

use admin db.aggregate([{$currentOp: {idleCursors: true}}, {$match: {type: "idleCursor"}}]);

Then, from the second shell, issue the admin command to kill the session. This will hang for 10 seconds due to the instrumentation above.

db.adminCommand({killSessions: [ { "id": <lsid> } ]});

While the killSessions command is hanging, close the cursor in the first shell:

cursor.close();

In a debug build, this will trip an assertion in the boost::optional code checking that the caller does not attempt to call get() when the value of the optional is boost::none:

s20005| mongos: src/third_party/boost-1.69.0/boost/optional/optional.hpp:1207: boost::optional::reference_type boost::optional<mongo::NamespaceString>::get() [T = mongo::NamespaceString]: Assertion `this->is_initialized()' failed.
Show
I've created a patch which instruments the server code with a sleep in order to reproduce this more reliably. Apply the following patch: diff --git a/src/mongo/db/kill_sessions_common.h b/src/mongo/db/kill_sessions_common.h index 95361aa5cf..7b24b579c8 100644 --- a/src/mongo/db/kill_sessions_common.h +++ b/src/mongo/db/kill_sessions_common.h @@ -101,6 +101,9 @@ public: ScopedKillAllSessionsByPatternImpersonator impersonator(_opCtx, *pattern); auto cursors = mgr.getCursorsForSession(session); + std::cout << "!!! sleeping after getting cursors for session" << std::endl; + sleepFor(Seconds{10}); + std::cout << "!!! waking up" << std::endl; for (const auto& id : cursors) { try { _eraser(mgr, id); Then, start a sharded cluster. Run the following against the mongos from one shell. This opens a logical session and creates a cursor within that session: var session = db.getMongo().startSession(); var sessionDb = session.getDatabase(db.getName()); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); sessionDb.c.insert({a: 1}); var cursor = sessionDb.c.find().batchSize(2); cursor.next(); From another shell, verify that there is an idle cursor, and obtain the lsid associated with the cursor: use admin db.aggregate([{$currentOp: {idleCursors: true }}, {$match: {type: "idleCursor" }}]); Then, from the second shell, issue the admin command to kill the session. This will hang for 10 seconds due to the instrumentation above. db.adminCommand({killSessions: [ { "id" : <lsid> } ]}); While the killSessions command is hanging, close the cursor in the first shell: cursor.close(); In a debug build, this will trip an assertion in the boost::optional code checking that the caller does not attempt to call get() when the value of the optional is boost::none: s20005| mongos: src/third_party/boost-1.69.0/boost/optional/optional.hpp:1207: boost::optional::reference_type boost::optional<mongo::NamespaceString>::get() [T = mongo::NamespaceString]: Assertion `this->is_initialized()' failed.
Sprint:
Query 2019-04-08
Linked BF Score:
58
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

ClusterCursorManager::killCursorsWithMatchingSessions() creates a lambda which makes a call to ClusterCursorManager::getNamespaceForCursorId(). This function returns a boost::optional<NamespaceString> which may be boost::none. However, the lambda calls get() on it unconditionally:

https://github.com/mongodb/mongo/blob/60c0441f9ba3196eaabe38935333d17f1aff88f8/src/mongo/s/query/cluster_cursor_manager.cpp#L677

In debug builds, this trips a process-fatal assertion. In non-debug builds it leads to undefined behavior.

This situation can occur because the killCursorsWithMatchingSessions() makes several different calls into the mongos cursor manager, which acquire the cursor manager mutex independently. First, it uses getCursorsForSession() to gather up the cursors associated with a logical session. Next, after dropping and requiring the mutex, it calls getNamespaceForCursorId() in the process of killing these cursors. If in between these two steps, all of the cursors for one of the relevant namespaces die, then getNamespaceForCursorId() will no longer be tracking the namespace and will return boost::none. In the repro steps, this is achieved by issuing a killCursors command concurrently with a killSessions command.

Assignee:: David Storch
Reporter:: David Storch
Participants:: David Storch, Githook User
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Mar 22 2019 10:51:55 PM UTC
Updated:: Oct 29 2023 10:22:39 PM UTC
Resolved:: Apr 02 2019 10:04:23 PM UTC
Confidence Status Last Update:: 29/Mar/19 6:43 PM

Details

Description

Attachments

Activity

People

Dates