Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
- perf-optimization-finder

Assigned Teams:

Product Performance
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Problem

Off-CPU profiling on YCSB 100read at 128 threads shows connection threads blocking on pthread_cond_signal when dispatching mirror requests, with 10–16ms of off-CPU time per occurrence. MirrorMaestroImpl::tryMirror is invoked on every mirror-eligible command from the service executor thread, and it calls ExecutorFuture(_executor).getAsync(...) which goes through GuaranteedExecutor::schedule → ThreadPoolTaskExecutor::scheduleWork → ThreadPool::Impl::schedule. Inside that last call the path acquires the ThreadPool task-queue mutex, enqueues the task, and calls _workAvailable.notify_one() — with 128 threads concurrently scheduling mirror requests, the ThreadPool mutex and condition variable become contention points and the per-request pthread_cond_signal cost dominates the dispatch path.

Solution

Replace the synchronous one-mirror-per-getAsync dispatch in MirrorMaestroImpl::tryMirror (src/mongo/db/mirror_maestro.cpp) with a thread-local batch queue. Each connection thread accumulates its mirror requests in a per-thread PerThreadBatch (held by shared_ptr so trailing entries survive the thread's exit), and when the batch reaches kBatchSize=16 the entire batch is moved into a single getAsync task that re-enters the existing _mirror loop for every accumulated request. A 100ms periodic flusher (registered via ServiceContext::getPeriodicRunner()->makeJob and held in a PeriodicJobAnchor) walks an _batchRegistry of all per-thread batches and drains any partial batches so that mirroring metrics (gMirroredReadsSection.sent, .pending) converge within the existing assert.soon budgets used by mirror-reads jstests. Per-instance batch ownership is tracked by a monotonic _instanceId so test fixtures that destroy and recreate MirrorMaestroImpl on the same thread re-register cleanly across address-reuse boundaries on debug/TSAN builds.

Assignee:: Jawwad Asghar
Reporter:: Jawwad Asghar
Participants:: Jawwad Asghar
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: May 29 2026 03:43:59 PM UTC
Updated:: Jun 01 2026 10:44:47 PM UTC
Resolved:: Jun 01 2026 10:44:47 PM UTC

Details

Description

Problem

Solution

Attachments

Activity

People

Dates