[SERVER-84467] Add duration of how long an oplog slot is kept open to FTDC Created: 02/Jan/24  Updated: 08/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Moustafa Maher Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-84449 High load can cause replication lag l... Open
related to SERVER-70155 Add duration of how long an oplog slo... Closed
Assigned Teams:
Replication
Participants:

 Description   

While observing a high lastDurable lag for (HELP-53772), we were uncertain whether it resulted from an oplog hole or if the calculation of lastDurable was simply taking more time due to an increased number of sessions. Having this information would have been more helpful in eliminating the first possibility.



 Comments   
Comment by Lingzhi Deng [ 02/Jan/24 ]

SERVER-70155 added a new metric "totalOplogSlotDurationMicros" in slow query messages. Would that be helpful here? Additionally, newer version of t2 has "oplog wt visibility lag" which is calculated as the gap between the timestamp of the latest oplog entry and the timestamp of the visibility point. But note that this is not exactly how long an oplog slot has been kept open. If no new oplog entry being written (i.e. latest oplog entry not changing), "oplog wt visibility lag" shown in t2 could be a constant. Would "totalOplogSlotDurationMicros" in slow query and "oplog wt visibility lag" in t2 together be sufficient?

Generated at Thu Feb 08 06:55:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.