Concurrent ident drop during WiredTiger step-up causes crash

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Execution
    • ALL
    • Hide

      See BF-42866.

      The crash occurs when:

      1. A node begins stepping up from follower to leader.
      2. TimestampMonitor concurrently determines that the sideWrites ident is eligible for deletion and calls dropIdent.
      3. The drop reaches WiredTiger before the step-up transition has completed.
      4. WiredTiger (during the drain portion of step-up) observes a missing table, panics, and the process crashes.
      Show
      See BF-42866. The crash occurs when: A node begins stepping up from follower to leader. TimestampMonitor concurrently determines that the sideWrites ident is eligible for deletion and calls dropIdent . The drop reaches WiredTiger before the step-up transition has completed. WiredTiger (during the drain portion of step-up) observes a missing table, panics, and the process crashes.
    • Storage Execution 2026-06-08
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description

      During follower-to-leader step-up, WiredTiger assumes that no concurrent schema operations (for example create, drop, or alter) reach the storage engine. If a schema operation does reach WiredTiger during this window, WiredTiger can panic and terminate the process.

      The TimestampMonitor background thread can drop the sideWrites ident while step-up is still in progress. Because sideWrites is a temporary side-writes ident rather than a user-visible collection, the drop bypasses the writable-primary guard that would normally serialize collection drops against role transitions. As a result, the drop can reach WiredTiger during the step-up window and trigger a panic.

      Expected behavior

      Ident drops initiated by TimestampMonitor, or by any similar background path that bypasses the writable-primary check, should be blocked or deferred while a role transition is in progress.

      Actual behavior

      The drop proceeds without checking whether step-up is in progress, allowing a schema operation to reach WiredTiger during the transition and crash the process.

      Proposed fix

      Add a role-transition guard on the MongoDB side before any dropIdent call that is not already serialized by the writable-primary check.

      At minimum, this should cover:

      • Temporary and side-writes idents dropped by TimestampMonitor
      • Any other ident lifecycle operation that bypasses the standard writable-primary condition

      This ensures that no schema operations are allowed to reach WiredTiger during step-up or step-down.

            Assignee:
            Thomas Goyne
            Reporter:
            Alexander Pullen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: