ThreadSanitizer: Data race in SpillWiredTigerKVEngine::cleanShutdown - Shutdown Session/Sweep Race

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sweep Server
    • Storage Engines - Foundations
    • None
    • None

        Evergreen patch:EVG Patch

        Affected Tests (10 of 57):
        1. unsupported_read_write_concerns.js
        2. mongod_waits_for_cms.js
        3. read_concern_target_time_wait.js
        4. simple_replica_set.js
        5. simple_one_node_restart.js
        6. change_stream_can_read_from_secondary.js
        7. large_oplog_batch_application.js
        8. establish_connection_during_stepup.js
        9. transaction_table_oplog_replay.js
        10. replSetInitiate_no_config.js

        Details:

        During StorageEngineImpl::cleanShutdown(), the spill WiredTiger engine's shutdown path closes the WiredTiger connection via wiredtiger_close(). Inside that call, __wti_prefetch_destroy() tears down the prefetch thread group by calling __wt_thread_group_destroy() →
        {}thread_group_shrink() → {}wt_session_close_internal(), which performs a memset (write) over a WiredTiger session struct in-place. Concurrently, WiredTiger's internal sweep server thread ({_}_sweep_server) is still running and calling __wt_session_array_walk(), which
        performs an atomic read of the same session struct memory — the two accesses race.

        Thread A — shutdown:
          memset ← WRITE (8 bytes)
          __wt_session_close_internal
          __thread_group_shrink
          __wt_thread_group_destroy
          __wti_prefetch_destroy
          __conn_close
          SpillWiredTigerKVEngine::cleanShutdown()
          StorageEngineImpl::cleanShutdown()

        Thread B — sweep server:
          __wt_session_array_walk ← ATOMIC READ (same address)
          __sweep_server

        Root Cause:

        WiredTiger's __conn_close shuts down the prefetch thread group — and closes those threads' sessions — before stopping the sweep server thread. The sweep server's __wt_session_array_walk continues reading the session array during this window, overlapping with the
        memset that zeros out the session struct being freed. This is a WiredTiger-internal shutdown ordering bug: the sweep server should be stopped before any sessions are destroyed.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Oscar Ortega
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: