[SERVER-25756] Replication should ensure that minValid is hit exactly Created: 23/Aug/16  Updated: 06/Dec/22  Resolved: 27/Nov/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-24223 Add hash to minvalid OpTime boundaries Closed
Assigned Teams:
Replication
Participants:

 Description   

Currently replication just checks that we become >= minValid before becoming secondary, without checking that we actually applied the exact minValid optime. This can lead to an undetected corruption if there was an upstream rollback between the time we fetched documents that caused us to set minValid and the time we fetched the oplog entry that is >= minValid. Note that if we detect this state, the only possible fix is to do a full resync.



 Comments   
Comment by Gregory McKeon (Inactive) [ 27/Nov/17 ]

This will go away with recoverable rollback.

Comment by Judah Schvimer [ 15/Nov/17 ]

redbeard0531, Do we still expect this to be possible? The sync source resolver checks for the requiredOpTime (minValid) after getting the sync source's RBID, and then we check the RBID for equality after receiving the first batch of documents. If a rollback occurs on the sync source after that, it should kill our cursor and make us go back into sync source selection. We thus should never sync operations from a branch of history that does not include minValid.

I agree, however, that we should invariant that we in fact hit minValid exactly.

Generated at Thu Feb 08 04:10:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.