[SERVER-25756] Replication should ensure that minValid is hit exactly Created: 23/Aug/16 Updated: 06/Dec/22 Resolved: 27/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Participants: | |||||||||
| Description |
|
Currently replication just checks that we become >= minValid before becoming secondary, without checking that we actually applied the exact minValid optime. This can lead to an undetected corruption if there was an upstream rollback between the time we fetched documents that caused us to set minValid and the time we fetched the oplog entry that is >= minValid. Note that if we detect this state, the only possible fix is to do a full resync. |
| Comments |
| Comment by Gregory McKeon (Inactive) [ 27/Nov/17 ] |
|
This will go away with recoverable rollback. |
| Comment by Judah Schvimer [ 15/Nov/17 ] |
|
redbeard0531, Do we still expect this to be possible? The sync source resolver checks for the requiredOpTime (minValid) after getting the sync source's RBID, and then we check the RBID for equality after receiving the first batch of documents. If a rollback occurs on the sync source after that, it should kill our cursor and make us go back into sync source selection. We thus should never sync operations from a branch of history that does not include minValid. I agree, however, that we should invariant that we in fact hit minValid exactly. |