[SERVER-623] fast new slave from a snapshot Created: 10/Feb/10  Updated: 12/Jul/16  Resolved: 17/Feb/10

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 1.3.3

Type: New Feature Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Now that we have lock + fsync we can take clean file system snapshots.
Given a snapshot of a master (or a slave) we should be able to create a new slave very quickly.

Process for ec2
1) lock + fsync
2) ebs snapshot
3) mount snapshot on the new slave
4) start slave with normal paramets
5) should know its "new" b/c setup as master, then take the newest value in the oplog and put that in local.sources

even not on ec2 this should be faster because you can compress the entire data set and don't have to re-create indexes



 Comments   
Comment by Aaron Staple [ 17/Feb/10 ]

Yeah, --fastsync is only needed for pairs. With master/slave that option is just ignored and there is no need to restart the master.

Comment by Eliot Horowitz (Inactive) [ 17/Feb/10 ]

For this case and replica pairs, restarting master is going to be the way.
We're going to be working on replica sets, and there we'll work on more hot changable settings.
But for this case, i think we're all set now.

@aaron --fastsync is only need for pairs, right? for a regular slave you can just start normally?

Comment by AndrewK [ 17/Feb/10 ]

@aaron - with regard to restarting the master, surely one would want to be working towards a situation where servers are never restarted unless there is something significantly wrong or a major config change is required? This is especially so when dealing with masters.

I for one would greatly appreciate more commands to control the state of a server / replication pair without restarting them. This "fastsync" is one such command. Another is one would be to control swapping master/slave status in a replication pair.

Comment by auto [ 17/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 play well with other tests
http://github.com/mongodb/mongo/commit/243bb5d0527bf689eba231597cb09e3862a5c72e

Comment by Aaron Staple [ 17/Feb/10 ]

If we really want to avoid restarting the master, we could implement the reset as a command instead. But this doesn't seem to be how we do such things currently (for example replacepeer).

Comment by Aaron Staple [ 16/Feb/10 ]

Ok, the way I have it implemented now once you do the fsync and copy you then have to restart both pair nodes with --fastsync. This is so the peer oplog positions are set correctly on the master as well as the slave.

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 specify fastsync on both nodes to eliminate slow oplog scan by new pair master
http://github.com/mongodb/mongo/commit/241ec02cae768a0da98ebf745269bf4cf9e80b67

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 implement and test fastsync / snapshot repl pair mode
http://github.com/mongodb/mongo/commit/23475ac37fa350480eb0fe5e0c2800c15ad77995

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 cleanup
http://github.com/mongodb/mongo/commit/2f796e3d2a608da167e67cde0dce220f2387191b

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 use ReplTest fwk for test
http://github.com/mongodb/mongo/commit/37949e561c1f3cda3271b4a7939f4995cbcefa46

Comment by Eliot Horowitz (Inactive) [ 16/Feb/10 ]

only whey paired

Comment by Aaron Staple [ 16/Feb/10 ]

Do we want to require --fastsync when not paired as well, or only when paired?

Comment by Aaron Staple [ 16/Feb/10 ]

ok thanks, I'll go with that

Comment by Eliot Horowitz (Inactive) [ 16/Feb/10 ]

maybe there is a flag for doing so.
something like --fastsync or something

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 work old boost
http://github.com/mongodb/mongo/commit/2b7dd6ad055fa1b8ec9f63dae394985f58a8c417

Comment by Aaron Staple [ 16/Feb/10 ]

So in a paired context, how would we determine that we've just been created from a master snapshot? Would we just assume this is the case if the --pairwith cmd line spec doesn't match the saved value? That might be a bit unsafe - someone could type in the wrong thing and inadvertently wreck their repl node.

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 set new slave sync point to known master tail
http://github.com/mongodb/mongo/commit/e0dbd0131feb120ad53cdf00514f8f5ff2885b5d

Comment by auto [ 16/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-623 basic test from snapshot
http://github.com/mongodb/mongo/commit/0ef13aa1d6ee7cb3a16dd2c0e5fff9e3653c4dad

Comment by Eliot Horowitz (Inactive) [ 16/Feb/10 ]

yes - unless its an order of magnitude harder.

Comment by Aaron Staple [ 16/Feb/10 ]

Does this need to work with repl pairs too or just master slave?

Generated at Thu Feb 08 02:54:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.