Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Concurrency
Labels:
None

Assigned Teams:

Storage Engines - Persistence
Total Hours with Assigned Team:
1,308.621
Epic Link:
SPM-4764
Sprint:
SE Persistence backlog
Story Points:
None

Summary

The product performance team is evaluating regressions when upgrading from Linux kernel 6.1 (CFS scheduler) to kernel 6.12 (EEVDF scheduler) on MongoDB Atlas. Two distinct regressions have been identified:

(a) Write throughput: ycsb_100update −10.7%

(b) Read latency: mixed-workload read p50 +228–300% (ecommerce FindOneProduct p50 +300%, ycsb 50/50 read p50 +228%). Slice-independent. Requires concurrent writes — read-only ycsb_100read on 6.12 is clean/faster.

Root cause (corrected from original description)

The regression-tracking yield is in MongoDB's transport layer: ServiceExecutor::yieldIfAppropriate → std::this_thread::yield() (service_executor.cpp), fired 2×/request via SessionWorkflow::Impl (after-send + before-receive). Under thread-per-connection at 128 conns / 8 vCPUs the guard runningThreads > cores is always true.

On EEVDF, each sched_yield() costs ~one full base slice (~2.8ms) instead of CFS's cheap requeue. At 2 yields/request the per-op off-CPU cost rises from 7% to 19.6% of thread time. The throughput-probing admission controller converges to a smaller write-ticket pool (28→16), collapsing admitted write concurrency (3.35→2.45 active writers) and producing the −11% throughput drop.

Mitigations evaluated so far

base_slice_ns=750µs host knob: partially recovers the write path (−10.7%→−6.5% vs CFS), net-positive over plain 6.12 suite-wide, but does not address (b) and adds small regressions of its own.
Patch F — remove the after-send _yieldPointReached() call in session_workflow.cpp (yields 2×/req → 1×/req): closes the ycsb_100update gap in CPU-bound workloads (−10.7%→−0.3%, ns). Does not fix (b). The before-receive yield (kept by Patch F) is the (b) lever — removing both yields halves read p50 on 50read50 (1259→626µs) at a −9% throughput / +17–22% write-latency cost. Yields were originanlly added to help tail-latency perforamnce (see BF-27452 / ~~SERVER-125097~~)

Ask

Looking for WT input on whether there is a cheaper cooperative-yield primitive for the transport layer's oversubscription case. Something that relinquishes the CPU cooperatively without incurring a full EEVDF base-slice descheduling.

is related to

SERVER-125097 Remove redundant per-request sched_yield in SessionWorkflow

Closed

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Jawwad Asghar
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jun 03 2026 06:39:21 PM UTC
Updated:: Jun 09 2026 05:37:42 PM UTC

Details

Description

Summary

Root cause (corrected from original description)

Mitigations evaluated so far

Ask

Attachments

Issue Links

Activity

People

Dates