-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
It would be useful to know how lagged we are in the recipient applying oplog entries during the applying phase of resharding.
The recipient could keep track of the most recently seen oplog timestamp and periodically write to a metric that is currentTimestamp - most recent oplog timestamp.
This will not be perfect if there is clock skew across the donor and recipient, but if Atlas uses an NTP server it should be good enough.
Useful lag metrics:
- (oplogFetched - oplogApplied) metric. Tells us how much work the recipient has to do to catch up writing oplogs.
- (currentTimestamp - most recent fetched oplog timestamp) metric. Tells us how far behind the oplog fetcher is in fetching oplog entries.