Summary
Some software and services provide a "circuit-breaker" mechanism, which can selectively disable remote system calls on-demand to prevent cascading failures when there is an outage in part of a distributed system.
Background
Here are some quotes from helpful resources that explain the concept in more detail:
From Martin Fowler's blog post on circuit breakers:
The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you'll also want some kind of monitor alert if the circuit breaker trips.
From NewRelic's blog post The Circuit Breaker Pattern Is A Great Tool (When Used Appropriately):
Developers can use a circuit breaker to prevent a resource dependency (typically a downstream HTTP service or database) from becoming overloaded. The circuit trips open automatically based on configured settings, like elevated response time, timeouts, or other errors, and then automatically closes (again, based on configurations such as elapsed time or some other trigger), ideally after the dependency has recovered. In some cases, this circuit breaker pattern can help you reduce overall downtime if you allow the dependencies to recover on their own before you start hammering on them.
From Netflix's circuit-breaker library Hystrix:
Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
Definition of done
- Research and document use cases for circuit-breakers in software and services that use MongoDB.
- Answer open questions below.
- Implement a proof-of-concept for a circuit breaker API in the Go driver.
Open questions:
- Should we add a circuit-breaking feature to all MongoDB drivers?
- MongoDB drivers (including the Go driver) already have features similar to a "circuit breaker", like the pausable connection pool, that prevent sending operations to MongoDB nodes that are in specific degraded states. What are the use cases for adding externally controllable circuit-breaking logic?
- Should the circuit breaker heuristics be built into drivers or controllable externally via arbitrary heuristics (e.g. a callback that controls the circuit breaker)?
|