Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-102671

We should track a oplog application lag metric in resharding

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      It would be useful to know how lagged we are in the recipient applying oplog entries during the applying phase of resharding.

      The recipient could keep track of the most recently seen oplog timestamp and periodically write to a metric that is currentTimestamp - most recent oplog timestamp.

      This will not be perfect if there is clock skew across the donor and recipient, but if Atlas uses an NTP server it should be good enough.

      Useful lag metrics:

      • (oplogFetched - oplogApplied) metric. Tells us how much work the recipient has to do to catch up writing oplogs.
      • (currentTimestamp - most recent fetched oplog timestamp) metric. Tells us how far behind the oplog fetcher is in fetching oplog entries.

            Assignee:
            Unassigned Unassigned
            Reporter:
            ben.gawel@mongodb.com Ben Gawel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: