-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
Replication
I did some reading on how Cassandra does its internal checking and it implements a phi-accrual detection algorithm which is more sensitive to dynamic network conditions than a simple heartbeat. It also provides a scalar failure measurement instead of a binary yes/no detection which allows for configuration of tolerance levels.
See:
http://ddg.jaist.ac.jp/pub/HDY+04.pdf
There are pros/cons (particularly around simplicity), but I'd be curious what you at 10gen think about the appropriateness/usefulness of basing your failure detection off of this kind of a protocol.