[SERVER-76581] Server can return stale topologyVersion to clients on stepdown due to heartbeats Created: 26/Apr/23 Updated: 29/Oct/23 Resolved: 18/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jason Chan | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | repl-shortlist | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Repl 2023-07-24 | ||||||||
| Participants: | |||||||||
| Description |
|
Currently, I think it's possible that the server will kill operations as part of stepdown due to heartbeats , and return a stale TopologyVersion because we don't call fulfillTopologyPromise (as part of _updateMemberStateFromTopologyCoordinator) until after we kill the operations. This causes issues with retryable writes if a server returns a response with error InterruptedDueToReplStatechange with topologyVersion N (unchanged), and the driver will not know to mark the server as unknown before retrying the same write to the same node that is no longer a primary. |
| Comments |
| Comment by Githook User [ 17/Jul/23 ] |
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: |
| Comment by Jason Chan [ 26/Apr/23 ] |
|
I think this is the same problem as |
| Comment by Jason Chan [ 26/Apr/23 ] |
|
I think this might be the same in the stepdown due to reconfig codepath. We might want to audit for any cases where we acquire the RSTL and kill operations before fulfilling the topology change promise. |