[SERVER-82358] Reduce the synchronization cost of `LockedClientsCursor` Created: 20/Oct/23  Updated: 24/Jan/24

Status: Open
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-76723 Create FTDC stall monitor Open
Assigned Teams:
Service Arch
Backport Requested:
v7.3
Participants:

 Description   

This primary targets serverStatus, and more specifically the FTDC thread running this command to collect metrics on active operations. Today, every invocation of serverStatus needs to exclusively lock the ServiceContext, and then iterate through the list of Client objects, individually lock them (using a spin-lock), and check their associated OperationContext. Under heavy-load, and if either of these locks is contended, this may result in stalls in collecting metrics.

The idea is to either bound the time spent waiting to acquire the ServiceContext mutex, or redesign the synchronization primitive (e.g. partition it) to make it more scalable / less susceptible to contentions during operation spikes.


Generated at Thu Feb 08 06:49:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.