[SERVER-17710] Do not automatically wipe existing data before initial sync Created: 24/Mar/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.1
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Andre de Frere Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: PM248, initialization, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-28006 Unexpected Initial Sync on Replicaset Closed
Assigned Teams:
Replication
Backwards Compatibility: Major Change
Participants:
Case:

 Description   

When running rs.initiate(), if any nodes other than the initiator node have data (but not an oplog), an error results:

"errmsg" : "couldn't initiate : member <node> has data already, cannot initiate set.  All members except initiator must be empty."

If you later use rs.add() to a node that has data (but not an oplog), then it will cause that node to initial sync and throw no errors.

If you can rs.add() a node with data in it, it follows that you should be able to have a configuration object for rs.initiate() that contains a node with data in it (and both should cause an initial sync of that node).



 Comments   
Comment by Eric Milkie [ 07/Apr/15 ]

Once begun, an initial sync always clears all the data as a first step; there are no deletion criteria to invoke there. Essentially, the deletion criteria are the initial sync triggers.
If a node's current config replica set name does not match the name in a proposed config via a replSetReconfig command, it will not accept the proposed config – thus it never considers running an initial sync.

Comment by David Murphy [ 07/Apr/15 ]

Eric,

I was not suggesting initial sync triggers but deletion criteria, if either of those are are true, then the inital sync can continue, if false it means either it was not in a replSet but has data, or it was from another replSet. Those are the cases you would want to bail in my line of thought. Initial sync should remove data if its from the current replSet and has an oplog ( aka someone manually called resync).

Does that make more sense?

David

Comment by Eric Milkie [ 07/Apr/15 ]

I think it was an oversight to allow a reconfig to add a node and have its data wiped. This ticket will be the work to remove that misfeature; an admin will then need to erase data on the node by hand as preparation for initial sync.

dmurphy your conditions listed above do not trigger an initial sync. The trigger conditions for initial sync are:
1. Previous initial sync was incomplete.
2. local.rs.oplog is empty or nonexistant.
3. Someone runs the resync command.

Comment by David Murphy [ 07/Apr/15 ]

I would say it should only remove data if

1) There is an oplog
2) Its data in local says it was a valid slave for this replSet name at some point.

Otherwise it would wipe the data if its sync with and new replSet string.

Comment by Eric Milkie [ 30/Mar/15 ]

I would argue that we should be going the other way in terms of safety over convenience; we should change initial sync to not delete data if data is present, except for the case where we are restarting a prior failed initial sync.

Generated at Thu Feb 08 03:45:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.