[SERVER-57186] Catchup takeover should not happen when last applied optime is in current term Created: 25/May/21 Updated: 29/Oct/23 Resolved: 14/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0
|
||||||||
| Sprint: | Repl 2021-06-28 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
We currently initiate catchup takeover when we get a heartbeat, no election is occurring and the primary's optime is behind a secondary node's optime. In a chaining situation, the primary's optime could be staler than our own optime because we're receiving writes through a different path (OplogFetcher), and not updating the primary's optime based on it. This will cause us to initiate catchup takeover, and immediately cancel it when we realize another secondary is ahead of us, as in HELP-24655 I believe that we are potentially in a catchup situation only when our last applied optime's term is less than our election term; if it is the same, that means the current primary has successfully caught up. Checking this would avoid scheduling and canceling catchup takeover. |
| Comments |
| Comment by Githook User [ 16/Jun/21 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Githook User [ 14/Jun/21 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |