Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8148

Implement Phi Accrual Failure Detection for detecting Node Failure

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Replication
    • Labels:

      Description

      I did some reading on how Cassandra does its internal checking and it implements a phi-accrual detection algorithm which is more sensitive to dynamic network conditions than a simple heartbeat. It also provides a scalar failure measurement instead of a binary yes/no detection which allows for configuration of tolerance levels.

      See:
      http://ddg.jaist.ac.jp/pub/HDY+04.pdf

      There are pros/cons (particularly around simplicity), but I'd be curious what you at 10gen think about the appropriateness/usefulness of basing your failure detection off of this kind of a protocol.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated: