[SERVER-4766] Make initial sync restartable per collection Created: 24/Jan/12  Updated: 06/Dec/22  Resolved: 20/Nov/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Richard Kreuter (Inactive) Assignee: Backlog - Replication Team
Resolution: Won't Do Votes: 7
Labels: PM248, initialSync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-4658 retry if failed when doing initial sync Closed
is duplicated by SERVER-9752 Resyncing a Stale Member, Stucked tor... Closed
Related
related to SERVER-9115 Log Initial Sync Progress Closed
is related to SERVER-18521 replica in STARTUP2 state cannot be s... Closed
is related to SERVER-22244 Detect sync source rollbacks during i... Closed
Assigned Teams:
Replication
Participants:

 Description   

Be able to restart an initial sync node and it will only need to clone collections which haven't been completed.

This will require ensuing that the oplog exists from the start of the cloning (from before the restart), and that no roll-back has occurred which would invalidate existing cloned data.

Old Description
Currently in initial sync, if the clone fails due to server crash or shutdown, we restart from scratch. It seems like it ought to be possible to record progress as we go so that we can pick up from wherever we left off. (For example, if the clone used the _id index and occasionally persisted the last written _id for each collection it visited, then it could pick up from the last _id seen. Reasoning about the minvalid oplog entry would remain unchanged, I believe.)

Operationally, this would make getting out of certain stuck cases less irritating for users, e.g., if a fresh node never goes from RECOVERING to SECONDARY for some reason, they could at least know that if they restart the process, we'll try our best to minimize subsequent recovery time, rather than starting over.



 Comments   
Comment by Steven Vannelli [ 20/Nov/19 ]

Closing this ticket out as the work for this project is captured within other tickets

Comment by Scott Hernandez (Inactive) [ 05/May/16 ]

I expect that we will store a document per collection in a new "local.replset.initialSyncProgress" collection which looks something like this:

{ ns:"db.coll_name", documentCountAtStart:###, start:{date:.., opTime:...}, end:{date:.., opTime:...}, elapsedTimeMS:{inserts:###, indexCommit:###, ...}}

Generated at Thu Feb 08 03:06:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.