Investigate >60% OperationThroughput regression in DSC vs ASC for data_handle_locust workload

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: DHandles
    • None
    • Storage Engines - Foundations
    • 380.787
    • SE Foundations - 2026-06-23
    • 8

      Background

      SLS-6071 reports that Disaggregated Storage (DSC) exhibits a regression of over 60% in OperationThroughput compared to classic MongoDB (ASC) on the data_handle_locust workload. Latency is also significantly higher on DSC. This was initially surfaced in PERF-7206.

      Observed Numbers

      Test Name Measurement ASC (Classic) DSC DS vs ASC (%)
      CreateOne Latency50thPercentile 12.782 29.2512 +128.85%
      CreateOne Latency95thPercentile 36.155 334.2754 +824.56%
      CreateOne OperationThroughput 18.19 6.72 -63.03%
      FindOne Latency50thPercentile 0.432 0.5132 +18.80%
      FindOne Latency95thPercentile 0.867 1.309 +50.98%
      FindOne OperationThroughput 13894.99 5187.71 -62.66%
      UpdateOne Latency50thPercentile 12.025 29.2082 +142.90%
      UpdateOne Latency95thPercentile 34.331 331.642 +865.91%
      UpdateOne OperationThroughput 19.47 6.70 -65.50%

      Goal

      Investigate the root cause of the DSC performance gap on the high-active-dhandle workload and identify potential improvements. Per the comment in SLS-6071, the regression may be related to DSC's use of layered tables.

      Suggested Investigation Areas

      • Profile DSC under the data_handle_locust workload to identify hot paths vs ASC
      • Examine data handle open/close/sweep costs in DSC (layered table overhead vs standard btree)
      • Check whether dhandle cache eviction or sweep behaviour differs significantly between ASC and DSC
      • Review lock contention on the dhandle list in the DSC layered-table code path
      • Compare checkpoint and reconciliation costs between ASC and DSC under this workload
      • Assess whether the 95th-percentile latency spikes (8-9x on write operations) point to periodic stalls (e.g. flush/ingest)

      References

            Assignee:
            Donald Anderson
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: