Investigate why query_stats_concurrent times out when using sample-based rate limiting at 100% vs window-based unlimited

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description:

      While enabling 1% query stats sampling by default (SERVER-116390), the query_stats_concurrent FSM workload was observed to time out (~90 minutes, no assertion failures) when internalQueryStatsSampleRate is set to 1 in the test's setup function.

      Setting internalQueryStatsSampleRate: 0 and relying on the existing window-based unlimited path (internalQueryStatsRateLimit: 1) does not exhibit this behavior, which matches the preSERVER-116390 behavior of the test.

      Both configurations result in 100% query stats capture for reads, so the performance difference is unexpected. 

      Steps to reproduce:

      1. In jstests/concurrency/fsm_workloads/query/query_stats/query_stats_concurrent.js, set internalQueryStatsSampleRate: 1 in setup (alongside the existing internalQueryStatsRateLimit: -1)
      2. Run the concurrency suite against this workload
      3. Observe ~90 minute runtime with no assertion failures before the task timeout

      Expected: Both paths at 100% capture should have comparable performance.

      Workaround (applied in SERVER-116390): Set internalQueryStatsSampleRate: 0 so the window-based unlimited path handles capture.

            Assignee:
            Erin Liang
            Reporter:
            Erin Liang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: