Implement Phi Accrual Failure Detection for detecting Node Failure

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Replication
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I did some reading on how Cassandra does its internal checking and it implements a phi-accrual detection algorithm which is more sensitive to dynamic network conditions than a simple heartbeat. It also provides a scalar failure measurement instead of a binary yes/no detection which allows for configuration of tolerance levels.

      See:
      http://ddg.jaist.ac.jp/pub/HDY+04.pdf

      There are pros/cons (particularly around simplicity), but I'd be curious what you at 10gen think about the appropriateness/usefulness of basing your failure detection off of this kind of a protocol.

              Assignee:
              [DO NOT USE] Backlog - Replication Team
              Reporter:
              Caleb Jones
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: