Prevent FTDC stalls during DSC step-up by bounding deadlines on $collStats collectors

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • ALL
    • Repl 2026-04-27, Repl 2026-05-11
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      DSC FTDC produces a ~2 minute gap during step-up, due to the oplog.rs.stats,config.transactions.stats collectors running an aggregate via the command framework. The aggregate command's setup path takes RSTL IX and Global IS, and because FTDC's opCtx has no maxTimeMS, the deadline is Date_t::max(). DSC step-up holds RSTL X + Global X across stepping up WT, conflicting with these aggregations.

      To fix this, we can apply the same convention the hand-crafted diagnostic sections already use: setting skipRSTLLock = true and a Date_t::now() deadline with kLeaveUnlocked, so the FTDC collector gives up cleanly when contended. The outcome is during DSC step-up, we lose only the per-tick oplog.rs.stats/collectionStats samples instead of freezing the entire FTDC stream.

            Assignee:
            Ali Mir
            Reporter:
            Ali Mir
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: