Utilize jitter when trying to rediscover a host after a failed monitoring requests

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      After a large network disruption, the RSM will attempt to rediscover hosts whose monitoring connections were severed. In large clusters, this could result in a burst of monitoring connection establishments, which may result in network congestion, DNS server overload, or contention on the RSM's reactor thread. If a randomized delay were used when scheduling the first monitoring request after a previously monitored server became marked as Unknown, it could help to mitigate these issues.

            Assignee:
            Unassigned
            Reporter:
            Patrick Freed
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: