Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- resharding-metrics-improvements

Assigned Teams:

Cluster Scalability
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

It would be useful to know how lagged we are in the recipient applying oplog entries during the applying phase of resharding.

The recipient could keep track of the most recently seen oplog timestamp and periodically write to a metric that is currentTimestamp - most recent oplog timestamp.

This will not be perfect if there is clock skew across the donor and recipient, but if Atlas uses an NTP server it should be good enough.

Useful lag metrics:

(oplogFetched - oplogApplied) metric. Tells us how much work the recipient has to do to catch up writing oplogs.
(currentTimestamp - most recent fetched oplog timestamp) metric. Tells us how far behind the oplog fetcher is in fetching oplog entries.

duplicates

SERVER-106057 Make resharding recipient track the exponential moving average of the time it takes to apply an oplog entry fetched from each donor

Closed

Assignee:: Unassigned
Reporter:: Ben Gawel
Participants:: Ben Gawel
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Mar 20 2025 03:12:34 PM UTC
Updated:: Jun 09 2025 07:37:52 PM UTC
Resolved:: Jun 09 2025 07:37:52 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates