[SERVER-76581] Server can return stale topologyVersion to clients on stepdown due to heartbeats Created: 26/Apr/23  Updated: 29/Oct/23  Resolved: 18/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-53431 Server should respond running operati... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2023-07-24
Participants:

 Description   

Currently, I think it's possible that the server will kill operations as part of stepdown due to heartbeats , and return a stale TopologyVersion because we don't call fulfillTopologyPromise (as part of _updateMemberStateFromTopologyCoordinator) until after we kill the operations.

This causes issues with retryable writes if a server returns a response with error InterruptedDueToReplStatechange with topologyVersion N (unchanged), and the driver will not know to mark the server as unknown before retrying the same write to the same node that is no longer a primary.



 Comments   
Comment by Githook User [ 17/Jul/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-76581 Server can return stale topologyVersion to clients on stepdown due to heartbeats
Branch: master
https://github.com/mongodb/mongo/commit/76c01d4b4230bc75dbf22b1385015f88fbeab374

Comment by Jason Chan [ 26/Apr/23 ]

I think this is the same problem as SERVER-53431 but we only put in the fix for the stepdown command code path in that patch.

Comment by Jason Chan [ 26/Apr/23 ]

I think this might be the same in the stepdown due to reconfig codepath. We might want to audit for any cases where we acquire the RSTL and kill operations before fulfilling the topology change promise.

Generated at Thu Feb 08 06:33:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.