-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
Sharding NYC 2023-09-18, Sharding NYC 2023-10-02, Sharding NYC 2023-10-16, Sharding NYC 2023-10-30
-
113
-
3
In the current version of the code we have retry loops with no backoff and asynchronous replica set monitor failure notification. This creates the scenario where a request can fail, the calling thread calls failedHost on the RSM, and the retry loop then immediately tries another request. This will happen within the span of microseconds, and the next attempt may result in the same failure due to not enough time passing.
This ticket is to improve this behavior by blocking the targeter when an error occurs, such as NotPrimary or InterruptedDueToReplStateChange (list not exhaustive), such that we return from the method that reports the failure to the RSM once the getHostsOrRefresh request of the RSM will return a different result (or a timeout occurs).
- related to
-
SERVER-50342 Make version of Shard::runCommand that returns a future
- Open