Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
- perf-optimization-finder

Assigned Teams:

Product Performance
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Problem

CollectionShardingStateMap::getOrCreate is invoked on every collection acquisition through the shard-role path (ServiceEntryPointShardRole::handleRequest → acquireCollectionsOrViewsLockFree → acquireServicesSnapshot → ScopedCollectionShardingState::acquire). Each call takes a std::shared_lock on the map's RWMutex — incrementing/decrementing a single shared cache line via _state.addAndFetch/subtractAndFetch across all connection threads — then hashes the NamespaceString and runs an stdx::unordered_map::find. On YCSB 128-thread in-cache reads (ARM Graviton 2), pprof attributes 0.15–0.22% flat / 0.17–0.23% cum to getOrCreate alone, with the call chain accounting for ~1.6% cum at acquireResolvedCollectionsOrViewsWithoutTakingLocks. Every operation in workloads that target one collection (single-collection OLTP, YCSB-style benchmarks) repeats this lookup and pays the RWMutex cache-line-bouncing cost, even though every Client thread is asking about the same NSS over and over.

Solution

Add a single-slot Client decoration ClientCSSCache{NamespaceString nss; CSSAndLock* cssAndLock;} and consult it at the top of getOrCreate. On a hit (cache.cssAndLock != nullptr && cache.nss == nss), return the cached pointer directly — skipping both the shared_lock and the hashmap find; on a miss, fall through to the unchanged slow path and populate the cache after the lookup completes. Safety relies on the existing in-file invariant (collection_sharding_state.cpp lines 119–120: "Entries of the _collections map must never be deleted or replaced"), which guarantees a CSSAndLock* is stable for the lifetime of the ServiceContext; the inner CollectionShardingState (and its concurrency protocol via cssMutex) is unchanged. Per-Client scope (rather than per-OperationContext) gives a near-100% hit rate for single-collection workloads since one Client serves many sequential operations on its connection thread, while admin commands and background workers run on their own Clients with their own (initially empty) caches and cannot pollute another Client's cache.

is related to

SERVER-125705 Fix heuristic CE source assertion in cbr_infrastructure.js

Closed

Assignee:: Jawwad Asghar
Reporter:: Jawwad Asghar
Participants:: Jawwad Asghar
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Apr 30 2026 05:23:23 PM UTC
Updated:: Apr 30 2026 10:40:15 PM UTC

Details

Description

Problem

Solution

Attachments

Issue Links

Activity

People

Dates