-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Context
As part of the lock-free serverStatus effort (SPM-4755), slow serverStatus sections contribute to FTDC gaps when collection exceeds 1 second. Analysis in SERVER-124926 and the slow serverStatus spreadsheet shows that the shardingStatistics section is a significant contributor. The spreadsheet data was collected from live Atlas clusters.
Owning team
Suggested owner (from CODEOWNERS): Catalog and Routing – Routing and Topology (@10gen/server-catalog-and-routing-routing-and-topology)
Impact (from slow serverStatus logs)
- Tab: mongos
- Total duration share: 2.7% of slow-run section time
- Avg duration when slow: 473 ms
- Rows where section exceeded 1s: 2.4% of slow runs
Code location
- Section registration: ServerStatusSectionBuilder<ShardingStatisticsServerStatus>("shardingStatistics")
- Primary files:
- src/mongo/db/sharding_environment/s_sharding_server_status.cpp (mongos)
- src/mongo/db/sharding_environment/sharding_server_status.cpp (mongod)
Task
Please audit the shardingStatistics serverStatus section and remove any unnecessary blocking work (locks, synchronous I/O, contended atomics, etc.).
Use SKUNK-40 branch for reference implementations and SKUNK-40-nonblocking for nonblocking annotations.
Acceptance criteria
- shardingStatistics section generation does not acquire blocking locks on the hot path
- No regression in section output for FTDC/default serverStatus collection
- Add or update tests if behavior changes
Related work
- Parent analysis: SERVER-124926
- Prior work: SERVER-74720
- is related to
-
SERVER-74720 The default 'shardingStatistics' serverStatus section takes locks
-
- Closed
-