Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8148

Implement Phi Accrual Failure Detection for detecting Node Failure

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Replication

      I did some reading on how Cassandra does its internal checking and it implements a phi-accrual detection algorithm which is more sensitive to dynamic network conditions than a simple heartbeat. It also provides a scalar failure measurement instead of a binary yes/no detection which allows for configuration of tolerance levels.

      See:
      http://ddg.jaist.ac.jp/pub/HDY+04.pdf

      There are pros/cons (particularly around simplicity), but I'd be curious what you at 10gen think about the appropriateness/usefulness of basing your failure detection off of this kind of a protocol.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            caleb.jones@disney.com Caleb Jones
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: