-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines
-
None
-
None
Problem
MongoDB's session pool calls cursor->reset() on every cached cursor before returning a session to the pool (via WiredTigerSession::releaseCursor()). After reset(), the cursor is fully idle — its position is cleared, no transaction is active — but it continues to hold a session_ref count on the underlying data handle indefinitely.
WiredTiger's exclusive-lock protocol for DDL operations (drop, verify, salvage, alter) synchronously waits for all session_ref counts on the target dhandle to reach zero before acquiring the exclusive lock (__wt_conn_dhandle_find → exclusive_access → wait for session_ref == 0). With reset cursors parked in idle pooled sessions this wait can block indefinitely: the session_ref is never released because no code path calls cursor->close() on a parked cursor.
To work around this, MongoDB calls session->closeAllCursors("") before returning each session to the pool. This releases session_ref by pushing cursors into WiredTiger's internal cursor cache, but forces a full _curfile_close → _curfile_cache → _curfile_reopen cycle on every session pool round-trip — even though the cursor was already fully reset and ready for use. Profiling of the ycsb_100read workload at thread_level 128 shows ~1.6% flat CPU wasted on this unnecessary cycle:
| Symbol | flat% |
|---|---|
| _curfile_close | 0.21% |
| _curfile_cache | 0.22% |
| _wt_cursor_cache_get | 0.28% |
| _curfile_reopen | 0.28% |
| _curfile_reset | 0.12% |
| related overhead | ~0.5% |
88.82% of cursor construction time goes through _session_open_cursor rather than the MongoDB-level cursor cache, confirming that closeAllCursors("") defeats the cache on every OpCtx destruction.
MongoDB investigated removing closeAllCursors("") and replacing it with an explicit DDL sweep: before any exclusive DDL operation, walk all idle sessions in the pool and call closeAllCursors(uri) on each. This approach is fundamentally racy: sessions that are active (checked out by a writer thread) at sweep time are not covered. When such a session returns to the pool its cursor is re-cached — re-introducing a session_ref on the dhandle. A draining-set extension was explored but introduced a second race window between the drain check and the pool-add, which are not under the same lock acquisition. After three rounds of correctness fixes each discovering a new timing window, this approach was abandoned. The rollback_views.js test hung at 6177+ retries with the draining-set approach. The correctness invariant — no pooled session holds a session_ref on a dhandle that is about to be exclusively locked — cannot be maintained reliably from MongoDB's side without structural support from WiredTiger.
Solution
Make WT_CURSOR::reset() release the cursor's session_ref count on the underlying data handle, and re-acquire it lazily on the next positioning call (search, next, prev, search_near).
- In _curfile_reset (or the common cursor reset path), after clearing cursor position and state, call the equivalent of _wt_session_release_dhandle(session) to drop the session_ref on the dhandle. Mark the cursor as dhandle-released (a new internal flag, e.g. WT_CURSTD_DHANDLE_RELEASED).
- In the cursor positioning entry points (_curfile_search, curfile_next, curfile_prev, curfile_search_near, and any other path that accesses btree data), if the cursor is in the DHANDLE_RELEASED state, re-acquire the dhandle reference before proceeding. This is the existing wt_cursor_reopen / _wt_session_get_dhandle path already used by WiredTiger's internal cursor cache on reopen.
- cursor->close() on a DHANDLE_RELEASED cursor skips the release call (already released) and proceeds to cache placement or final close as today.
The target invariant: a cursor in reset state holds zero session_ref; a cursor with an active position holds exactly one session_ref. This is structurally the same invariant WiredTiger's internal cursor cache already enforces when a cursor is placed into the cache via cursor->close() — the change extends it to the reset() call path.
With this change MongoDB can remove session->closeAllCursors("") from WiredTigerConnection::_releaseSession entirely. Parked cursors survive session pool round-trips with zero session_ref, DDL operations proceed without waiting, and the next request that reuses the same session hits the MongoDB-level cursor cache directly with no WiredTiger API call.
- blocks
-
SERVER-122455 Persist WiredTiger session cursor cache across OpCtx boundaries to eliminate per-request close/reopen
-
- Blocked
-
- related to
-
SERVER-122455 Persist WiredTiger session cursor cache across OpCtx boundaries to eliminate per-request close/reopen
-
- Blocked
-