[COMPASS-7575] Investigate changes in SERVER-84440: Expose the number of replication waiters in serverStatus Created: 11/Jan/24  Updated: 22/Jan/24  Resolved: 16/Jan/24

Status: Closed
Project: Compass
Component/s: None
Affects Version/s: None
Fix Version/s: No version

Type: Investigation Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Rhys Howell
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-84440 Expose the number of replication wait... Closed
Assigned Teams:
Developer Tools
Documentation Changes: Not Needed

 Description   
Original Downstream Change Summary

This adds two metrics to the serverStatus.metrics section:

repl.waiters.replication
 
repl.waiters.opTime

repl.waiters.replication exposes how many threads are waiting for a replicated and/or journaled write concern to resolve. repl.waiters.opTime exposes how many threads are waiting for a local optime only.

Description of Linked Ticket

The replication waiters list can grow with the number of operations waiting for write concern. Advancing replication timestamps also requires updating all waiters in this list under a mutex. If the list is long, this can take a long time.

It would be useful to be able to see how many operations are waiting for replication in this state, which would make it easier to diagnose problems in this area.



 Comments   
Comment by Rhys Howell [ 16/Jan/24 ]

No devtools product changes needed, we expose the results of serverStatus in the shell, and in Compass we show some information using it in the performance page. We don't do any parsing that this would impact and it doesn't sound like we want to add explicit parsing to show this.

Comment by PM Bot [ 11/Jan/24 ]

Fix Version updated for upstream SERVER-84440:
7.3.0-rc0

Generated at Wed Feb 07 22:46:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.