-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
Replication
Be able to restart an initial sync node and it will only need to clone collections which haven't been completed.
This will require ensuing that the oplog exists from the start of the cloning (from before the restart), and that no roll-back has occurred which would invalidate existing cloned data.
Old Description
Currently in initial sync, if the clone fails due to server crash or shutdown, we restart from scratch. It seems like it ought to be possible to record progress as we go so that we can pick up from wherever we left off. (For example, if the clone used the _id index and occasionally persisted the last written _id for each collection it visited, then it could pick up from the last _id seen. Reasoning about the minvalid oplog entry would remain unchanged, I believe.)
Operationally, this would make getting out of certain stuck cases less irritating for users, e.g., if a fresh node never goes from RECOVERING to SECONDARY for some reason, they could at least know that if they restart the process, we'll try our best to minimize subsequent recovery time, rather than starting over.
- is duplicated by
-
SERVER-4658 retry if failed when doing initial sync
- Closed
-
SERVER-9752 Resyncing a Stale Member, Stucked tor STARTUP2
- Closed
- is related to
-
SERVER-18521 replica in STARTUP2 state cannot be stopped
- Closed
-
SERVER-22244 Detect sync source rollbacks during initial sync
- Closed
- related to
-
SERVER-9115 Log Initial Sync Progress
- Closed