Count unreplicated oplog entries on the primary

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      We want to be able to predict how soon a lagged secondary will catch up. If the write rate on the primary is inconsistent, then a secondary might be lagging by a small amount of time but by many oplog entries. Similarly, it might be lagging by a large amount of time but by very few oplog entries. 

      Our current measure of replication lag compares the latest respective optimes on the primary and on a secondary. A more prognostic measure would be to count unreplicated oplog entries. This should be exported as a metric so that we could infer that if we apply X oplog entries per second, then it should take Y seconds to catch up.

      This would rely on the assumption that each flurry of writes on the primary is ~homogenous such that each group of X oplog entries takes the ~same amount of time to apply. Even if this isn't strictly true, it also isn't true today, so it would still be useful to remove the assumption that the write rate is consistent when we estimate catchup time.

            Assignee:
            Unassigned
            Reporter:
            Brad Cater
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: