[SERVER-38918] Coordinator should make configOpTime durable before sending prepare Created: 09/Jan/19 Updated: 27/Oct/23 Resolved: 30/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | ShardedTxn:KnownBugs | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Sharding 2019-08-12, Sharding 2019-08-26 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
...and shard primaries should read the configOpTime and wait for it to become durable on stepup, before resuming coordinating commits. Otherwise, a new coordinator primary might assume a participant shard has received the decision, even though it hasn't. Example:
The suggested fix implementation is to: 1) make coordinators update the configOpTime in the minOpTimeRecovery document along with writing the participant list, here. (The coordinator then waits for writeConcern, which would cover both writes). 2) make the "recover pending commits on stepup" task load the persisted configOpTime into memory before waiting for writeConcern |
| Comments |
| Comment by Kaloian Manassiev [ 30/Sep/20 ] |
|
This has now gone away as a result of the changes done under |
| Comment by Esha Maharishi (Inactive) [ 09/Jan/19 ] |
|
Note: This may be a more general issue than for transaction coordination, so we may want to wait for the configOpTime to become durable in a more general place on stepUp, though that would delay transition to primary, which is why I didn't put that as the suggested fix for now. |