[SERVER-18041] Support parallel cloning during initial sync Created: 14/Apr/15  Updated: 08/Jan/24

Status: Investigating
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Crystal Horn Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 10
Labels: PM248, initialSync, pmr
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-24069 Add parameter to control concurrency ... Backlog
Duplicate
is duplicated by SERVER-19022 Parallelize Initial Syncs Closed
Related
related to SERVER-7680 Have replSetSyncFrom restart initialS... Closed
related to SERVER-22061 DataReplicator: Support resume per co... Closed
is related to SERVER-7527 Improve speed of replication when in ... Closed
Assigned Teams:
Replication
Participants:
Case:

 Comments   
Comment by Ramon Fernandez Marina [ 11/Mar/17 ]

Hi Roy,

Unfortunately initial sync is not resumable in 3.4 yet; I believe that work is defined in these tickets.

If I'm not mistaken, there were 512 tickets related to replication in the 3.4 development cycle, 92 of which mention inital sync. While there are many ways that initial sync has been improved, I'm listing the highlights below:

  • intial sync is more resilient to network issues because the network stack was completely replaced, and the new implementation handles network issues better (list of Networking tickets in 3.4)
  • indexes are built while the data is being cloned. For data sets that are greater than physical memory, this represents a significant speed up. The change was hidden in a rather opaque ticket (SERVER-23059), but the idea is that as the documents are cloned, we extract index keys for each defined index in a single pass while inserting the documents into the new secondary. Before, the documents would be cloned in one pass, then traversed to build the _id index in a second pass, then traversed in a third pass to extract keys for all defined secondary indexes. For very large data sets this can result in close to 3x performance improvements
  • MongoDB 3.4 also adds the ability to compress intra-cluster communication (SERVER-3018). This is turned off by default, and only makes a difference when the two nodes are constrained by network bandwidth and the data is compressible

In our internal sharded clusters, with live use and the balancer enabled, we've seen initial sync go from 5-7 days to a few hours.

Hope this helps.

Regards,
Ramón.

Comment by Roy Reznik [ 07/Mar/17 ]

Hi Ramon,

I watched that ticket.
Except for specifying that MongoDB 3.4 should be resilient to network issues (without specifying how/why) and that the oplog is copied concurrently with the data (which is not expected to make it faster) I did not see anything else.
Do you have any full specifications as to why 3.4 is faster, and why/how 3.4 is resumable?

Roy.

Comment by Ramon Fernandez Marina [ 09/Nov/16 ]

Hi royrez@microsoft.com,

3.4 comes with faster, resumable initial sync. We're working on the documentation for these new features DOCS-8293, so feel free to watch that ticket for updates if interested.

We've also published three release candidates. 3.4.0-rc2 is the latest at the time of this writing, and you can download it and test these features. If you do any testing and find any issues please open new SERVER tickets so we can investigate them.

Thanks,
Ramón.

Comment by Roy Reznik [ 06/Nov/16 ]

Is it still planned for 3.4?

Comment by Scott Hernandez (Inactive) [ 04/Jan/16 ]

liranms, thanks for the pull request. I've added some comment there. Let's work on that until we have a plan, and then come back to jira for the next steps.

dynamike, This has slipped from 3.2 as expected but we are working hard on getting this into the 3.4 release – to replace both the cloner and data (delta = oplog) replication process. We will have more time to discuss and understand the upstream consequences of increasing replication concurrency and how it will affect end users.

The current plan is to support parallel copying at the collection level so we can support databases with a lot of collections or a small number of collections in a lot of databases. There may also be support for resuming the cloning process if the initial sync is stopped (like due to a system shutdown), so we can only clone the missing collections.

Comment by Liran Moysi [ 24/Dec/15 ]

It is extremely important to support for parallel cloning, especially during the index build phase.
As it happens, our MongoDB replicas sit on some powerful machines and its a waste using a single core, especially when using engines like wiredTiger and RocksDB which use compression.
It wouldn't be as serious had it not prevented us from adding new replicas with different engines, as they take alot of time to build the index (in the order of days), which causes them to loose sync, even after enlarging the oplog window.

Regarding the DOS attack that @scotthernandez mentioned, it's less relevant for the index building stage (which happens on the node itself) so paralleling this stage would not harm.

Comment by Michael Kania [ 18/Jun/15 ]

Totally agree to keep the default initial sync rate heavily limited and having the ability to dynamically tune it the correct way to do it. Looking forward to the new Data Replicator stuff.

Comment by Scott Hernandez (Inactive) [ 18/Jun/15 ]

It is not planned for 3.2 at this time.

There are too many open questions about performance and the load it would create upstream on the sync source, to schedule it now. In addition there are currently no configurable options for initial sync, and without adaptive load monitoring/control, one would probably want to control the concurrency of how many collections are cloned at once at a minimum. We don't want to introduce a feature that can DOS attack other members in the replica set during initial sync – some people have actually seen problems with the current initial sync process causing performance degradation on live systems since it can't be throttled.

The good news is that the new Data Replicator components, we will soon have internally, will allow us to support concurrent clones relatively easily when it is time.

Comment by Michael Kania [ 18/Jun/15 ]

Is this planned for 3.2?

Generated at Thu Feb 08 03:46:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.