-
Type: Task
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Repl 2020-02-10
When the server responds with a State Change Errors from the failCommand failpoint, it should also increase topologyVersion and respond to waiting isMasters. The Drivers team uses failCommand extensively in spec tests for retryable writes+reads. Without this change, it takes the client ~10 seconds (maxAwaitTimeMS) to rediscover the server's state.
For example:
- client configures a failCommand with NotMaster
- client runs a retryable write against Primary P
- client observes a NotMaster error and sets P to Unknown
- client runs the retry attempt which blocks until P is rediscovered
- P's Monitor is blocked for 10 seconds waiting for an awaitable isMaster response
After this change to 10 seconds hang should be removed:
- client configures a failCommand with NotMaster
- client runs a retryable write against Primary P
- client observes a NotMaster error and sets P to Unknown
- client runs the retry attempt which blocks until P is rediscovered
- P's Monitor immediately receives an awaitable isMaster response and set P to Primary
- client retry attempt succeeds ASAP