[SERVER-31487] Replace replSetSyncFrom resync option with initialSyncSource server parameter Created: 10/Oct/17  Updated: 06/Dec/22  Resolved: 06/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 1
Labels: former-quick-wins, initialSync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-44272 Resync data on replSetSyncFrom during... Closed
is duplicated by SERVER-33208 Remove all code related to resync com... Closed
Related
related to SERVER-44272 Resync data on replSetSyncFrom during... Closed
is related to SERVER-28840 replSetSyncFrom causes InitialSyncer ... Closed
is related to SERVER-35372 replSetSyncFrom can cause deadlock be... Closed
is related to SERVER-38731 Ability to specify sync source read p... Closed
Assigned Teams:
Replication
Sprint: Repl 2018-06-18, Repl 2018-07-02
Participants:

 Description   

resync is broken and not being fixed, but we still do resyncs in replSetSyncFrom which is dangerous. A safer way and cleaner way to choose a sync source would be a server parameter that specifies a preferred sync source for initial sync.



 Comments   
Comment by Siyuan Zhou [ 09/Jul/18 ]

Since `--setParameter` can be given on startup, introducing the new flag as a server parameter sounds a valid solution.

Comment by Siyuan Zhou [ 09/Jul/18 ]

alyson.cabral, SERVER-31239 has removed the resync command. The purpose of this ticket is to remove the resync behavior of `replSetSyncFrom` when running against a node in initial sync. The reason we removed the resync command and behavior is that the state management of replication is implicit and the concurrency around resync is complex and not clear. schwerin suggested considering adding resync back when the initial sync state becomes explicit as part of our faster and robust initial sync projects.

Instead of adding a startup parameter to specify the sync source, Andy suggested adding a startup parameter to make the node in "waiting for specifying sync source" mode. Then a `replSetSyncFrom` command will specify the sync source and start initial sync. The implementation details need some design and would take longer than what we allocated for quick wins.

Comment by Tess Avitabile (Inactive) [ 09/Jul/18 ]

Can you explain what would be intrusive about adding a startup parameter? I think that would be a good solution, to avoid having to set a runtime parameter in a narrow time window.

Comment by Alyson Cabral (Inactive) [ 09/Jul/18 ]

Does the resync command have these same problems today?

Comment by Siyuan Zhou [ 05/Jul/18 ]

The only supported procedure of initial sync is 1) start a node, 2) the node gets a replset config, which enables replication, 3) the node enters initial sync. This ticket is to allow changing the sync source before step 2 by setting the server parameter. However, this only works for a replset before initialization, where all nodes are waiting for the very first config and the user could have enough time to set the initial sync source. In the most common cases where all nodes already have the replset config and keep sending heartbeats to each other, the newly added node that will do an initial sync only has at most 2 seconds before it receives the heartbeat and sends another one to fetch the config. The time window for a user to set the sync source of initial sync on the new node is pretty narrow.

Adding a new startup parameter or adding a new field to replset config to specify the initial sync node will solve the problem but both seem too intrusive. alyson.cabral and tess.avitabile, what do you think about the behavioral change?

If we decide to follow the original design, I would propose to allow replSetSyncFrom command before a node is initialized, which is the minimal code change.

Comment by Spencer Brody (Inactive) [ 19/Apr/18 ]

As part of this we should remove any lingering code related to the resync command.

Comment by Spencer Brody (Inactive) [ 30/Mar/18 ]

Note, we'll keep around replSetSyncFrom for specifying a sync source for steady state replication. This ticket is to remove the 'resync' option from replSetSyncFrom and use the new server parameter as the way for specifying a sync source for initial sync.

Generated at Thu Feb 08 04:27:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.