cursor->reset() should release session_ref on the data handle

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines
    • None
    • None

      Problem

      MongoDB's session pool calls cursor->reset() on every cached cursor before returning a session to the pool (via WiredTigerSession::releaseCursor()). After reset(), the cursor is fully idle — its position is cleared, no transaction is active — but it continues to hold a session_ref count on the underlying data handle indefinitely.

      WiredTiger's exclusive-lock protocol for DDL operations (drop, verify, salvage, alter) synchronously waits for all session_ref counts on the target dhandle to reach zero before acquiring the exclusive lock (__wt_conn_dhandle_findexclusive_access → wait for session_ref == 0). With reset cursors parked in idle pooled sessions this wait can block indefinitely: the session_ref is never released because no code path calls cursor->close() on a parked cursor.

      To work around this, MongoDB calls session->closeAllCursors("") before returning each session to the pool. This releases session_ref by pushing cursors into WiredTiger's internal cursor cache, but forces a full _curfile_close_curfile_cache_curfile_reopen cycle on every session pool round-trip — even though the cursor was already fully reset and ready for use. Profiling of the ycsb_100read workload at thread_level 128 shows ~1.6% flat CPU wasted on this unnecessary cycle:

      Symbol flat%
      _curfile_close 0.21%
      _curfile_cache 0.22%
      _wt_cursor_cache_get 0.28%
      _curfile_reopen 0.28%
      _curfile_reset 0.12%
      related overhead ~0.5%

      88.82% of cursor construction time goes through _session_open_cursor rather than the MongoDB-level cursor cache, confirming that closeAllCursors("") defeats the cache on every OpCtx destruction.

      MongoDB investigated removing closeAllCursors("") and replacing it with an explicit DDL sweep: before any exclusive DDL operation, walk all idle sessions in the pool and call closeAllCursors(uri) on each. This approach is fundamentally racy: sessions that are active (checked out by a writer thread) at sweep time are not covered. When such a session returns to the pool its cursor is re-cached — re-introducing a session_ref on the dhandle. A draining-set extension was explored but introduced a second race window between the drain check and the pool-add, which are not under the same lock acquisition. After three rounds of correctness fixes each discovering a new timing window, this approach was abandoned. The rollback_views.js test hung at 6177+ retries with the draining-set approach. The correctness invariant — no pooled session holds a session_ref on a dhandle that is about to be exclusively locked — cannot be maintained reliably from MongoDB's side without structural support from WiredTiger.

      Solution

      Make WT_CURSOR::reset() release the cursor's session_ref count on the underlying data handle, and re-acquire it lazily on the next positioning call (search, next, prev, search_near).

      1. In _curfile_reset (or the common cursor reset path), after clearing cursor position and state, call the equivalent of _wt_session_release_dhandle(session) to drop the session_ref on the dhandle. Mark the cursor as dhandle-released (a new internal flag, e.g. WT_CURSTD_DHANDLE_RELEASED).
      2. In the cursor positioning entry points (_curfile_search, curfile_next, curfile_prev, curfile_search_near, and any other path that accesses btree data), if the cursor is in the DHANDLE_RELEASED state, re-acquire the dhandle reference before proceeding. This is the existing wt_cursor_reopen / _wt_session_get_dhandle path already used by WiredTiger's internal cursor cache on reopen.
      3. cursor->close() on a DHANDLE_RELEASED cursor skips the release call (already released) and proceeds to cache placement or final close as today.

      The target invariant: a cursor in reset state holds zero session_ref; a cursor with an active position holds exactly one session_ref. This is structurally the same invariant WiredTiger's internal cursor cache already enforces when a cursor is placed into the cache via cursor->close() — the change extends it to the reset() call path.

      With this change MongoDB can remove session->closeAllCursors("") from WiredTigerConnection::_releaseSession entirely. Parked cursors survive session pool round-trips with zero session_ref, DDL operations proceed without waiting, and the next request that reuses the same session hits the MongoDB-level cursor cache directly with no WiredTiger API call.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Daniel Hill
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: