Cache CollectionShardingStateMap::getOrCreate result per Client

XMLWordPrintableJSON

    • Product Performance
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem

      CollectionShardingStateMap::getOrCreate is invoked on every collection acquisition through the shard-role path (ServiceEntryPointShardRole::handleRequestacquireCollectionsOrViewsLockFreeacquireServicesSnapshotScopedCollectionShardingState::acquire). Each call takes a std::shared_lock on the map's RWMutex — incrementing/decrementing a single shared cache line via _state.addAndFetch/subtractAndFetch across all connection threads — then hashes the NamespaceString and runs an stdx::unordered_map::find. On YCSB 128-thread in-cache reads (ARM Graviton 2), pprof attributes 0.15–0.22% flat / 0.17–0.23% cum to getOrCreate alone, with the call chain accounting for ~1.6% cum at acquireResolvedCollectionsOrViewsWithoutTakingLocks. Every operation in workloads that target one collection (single-collection OLTP, YCSB-style benchmarks) repeats this lookup and pays the RWMutex cache-line-bouncing cost, even though every Client thread is asking about the same NSS over and over.

      Solution

      Add a single-slot Client decoration ClientCSSCache{NamespaceString nss; CSSAndLock* cssAndLock;} and consult it at the top of getOrCreate. On a hit (cache.cssAndLock != nullptr && cache.nss == nss), return the cached pointer directly — skipping both the shared_lock and the hashmap find; on a miss, fall through to the unchanged slow path and populate the cache after the lookup completes. Safety relies on the existing in-file invariant (collection_sharding_state.cpp lines 119–120: "Entries of the _collections map must never be deleted or replaced"), which guarantees a CSSAndLock* is stable for the lifetime of the ServiceContext; the inner CollectionShardingState (and its concurrency protocol via cssMutex) is unchanged. Per-Client scope (rather than per-OperationContext) gives a near-100% hit rate for single-collection workloads since one Client serves many sequential operations on its connection thread, while admin commands and background workers run on their own Clients with their own (initially empty) caches and cannot pollute another Client's cache.

            Assignee:
            Jawwad Asghar
            Reporter:
            Jawwad Asghar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: