Reduce the alerting threshold for replication lag

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Currently we have alerting for replication lag that leads to a PROACTIVE ticket (e.g. https://jira.mongodb.org/browse/PROACTIVE-118019). The threshold for this is currently multiple days, which is at least an order of magnitude over our target for maximum replication lag and is far too long to catch problems before customers notice. We should determine what the lowest reasonable time for this is and reduce the alerting threshold to that.

            Assignee:
            Amirsaman Memaripour
            Reporter:
            Thomas Goyne
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: