<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:46:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-18041] Support parallel cloning during initial sync</title>
                <link>https://jira.mongodb.org/browse/SERVER-18041</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description></description>
                <environment></environment>
        <key id="195942">SERVER-18041</key>
            <summary>Support parallel cloning during initial sync</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="11262" iconUrl="https://jira.mongodb.org/images/icons/statuses/generic.png" description="">Investigating</status>
                    <statusCategory id="4" key="indeterminate" colorName="inprogress"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-repl">Backlog - Replication Team</assignee>
                                    <reporter username="crystal.horn@mongodb.com">Crystal Horn</reporter>
                        <labels>
                            <label>PM248</label>
                            <label>initialSync</label>
                            <label>pmr</label>
                    </labels>
                <created>Tue, 14 Apr 2015 13:11:51 +0000</created>
                <updated>Mon, 8 Jan 2024 19:24:31 +0000</updated>
                                                                            <component>Replication</component>
                                        <votes>10</votes>
                                    <watches>32</watches>
                                                                                                                <comments>
                            <comment id="1521834" author="ramon.fernandez" created="Sat, 11 Mar 2017 13:28:15 +0000"  >&lt;p&gt;Hi Roy,&lt;/p&gt;

&lt;p&gt;Unfortunately initial sync is not resumable in 3.4 yet; I believe that work is defined in &lt;a href=&quot;https://jira.mongodb.org/issues/?jql=key%20in%20(SERVER-18564%2C%20SERVER-22061%2C%20SERVER-18565)&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;these tickets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If I&apos;m not mistaken, there were &lt;a href=&quot;https://jira.mongodb.org/issues/?jql=project%20%3D%20server%20and%20resolution%20%3D%20Fixed%20and%20component%20%3D%20%22Replication%22%20and%20issueFunction%20in%20issuefieldmatch(%22project%20%3D%20SERVER%22%2C%20%22fixVersions%22%2C%20%22(3.3.*%7C3.4.0*)%22)&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;512 tickets&lt;/a&gt; related to replication in the 3.4 development cycle, &lt;a href=&quot;https://jira.mongodb.org/issues/?jql=project%20%3D%20server%20and%20resolution%20%3D%20Fixed%20and%20component%20%3D%20%22Replication%22%20and%20issueFunction%20in%20issuefieldmatch(%22text%20~%20%27initial%20sync%27%22%2C%20%22fixVersions%22%2C%20%22(3.3.*%7C3.4.0*)%22)&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;92 of which mention inital sync&lt;/a&gt;. While there are many ways that initial sync has been improved, I&apos;m listing the highlights below:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;intial sync is more resilient to network issues because the network stack was completely replaced, and the new implementation handles network issues better (&lt;a href=&quot;https://jira.mongodb.org/issues/?jql=project%20%3D%20server%20and%20resolution%20%3D%20Fixed%20and%20component%20%3D%20%22Networking%22%20and%20issueFunction%20in%20issuefieldmatch(%22project%20%3D%20SERVER%22%2C%20%22fixVersions%22%2C%20%22(3.3.*%7C3.4.0*)%22)&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;list of Networking tickets in 3.4&lt;/a&gt;)&lt;/li&gt;
	&lt;li&gt;indexes are built while the data is being cloned. For data sets that are greater than physical memory, this represents a significant speed up. The change was hidden in a rather opaque ticket (&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-23059&quot; title=&quot;Collection and Database Cloner: Implement storage engine interface&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-23059&quot;&gt;&lt;del&gt;SERVER-23059&lt;/del&gt;&lt;/a&gt;), but the idea is that as the documents are cloned, we extract index keys for each defined index in a single pass while inserting the documents into the new secondary. Before, the documents would be cloned in one pass, then traversed to build the _id index in a second pass, then traversed in a third pass to extract keys for all defined secondary indexes. For very large data sets this can result in close to 3x performance improvements&lt;/li&gt;
	&lt;li&gt;MongoDB 3.4 also adds the ability to compress intra-cluster communication (&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-3018&quot; title=&quot;Compression of wire protocol&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-3018&quot;&gt;&lt;del&gt;SERVER-3018&lt;/del&gt;&lt;/a&gt;). This is turned off by default, and only makes a difference when the two nodes are constrained by network bandwidth and the data is compressible&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;In our internal sharded clusters, with live use and the balancer enabled, we&apos;ve seen initial sync go from 5-7 days to a few hours.&lt;/p&gt;

&lt;p&gt;Hope this helps.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;


</comment>
                            <comment id="1517842" author="royrez@microsoft.com" created="Tue, 7 Mar 2017 15:45:03 +0000"  >&lt;p&gt;Hi Ramon,&lt;/p&gt;

&lt;p&gt;I watched that ticket.&lt;br/&gt;
Except for specifying that MongoDB 3.4 should be resilient to network issues (without specifying how/why) and that the oplog is copied concurrently with the data (which is not expected to make it faster) I did not see anything else.&lt;br/&gt;
Do you have any full specifications as to why 3.4 is faster, and why/how 3.4 is resumable?&lt;/p&gt;

&lt;p&gt;Roy.&lt;/p&gt;</comment>
                            <comment id="1430101" author="ramon.fernandez" created="Wed, 9 Nov 2016 12:39:29 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=royrez%40microsoft.com&quot; class=&quot;user-hover&quot; rel=&quot;royrez@microsoft.com&quot;&gt;royrez@microsoft.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;3.4 comes with faster, resumable initial sync. We&apos;re working on the documentation for these new features &lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-8293&quot; title=&quot;Improved initial sync: faster/resumable&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DOCS-8293&quot;&gt;&lt;del&gt;DOCS-8293&lt;/del&gt;&lt;/a&gt;, so feel free to watch that ticket for updates if interested.&lt;/p&gt;

&lt;p&gt;We&apos;ve also published three release candidates. 3.4.0-rc2 is the latest at the time of this writing, and you can download it and test these features. If you do any testing and find any issues please open new SERVER tickets so we can investigate them.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1427417" author="royrez@microsoft.com" created="Sun, 6 Nov 2016 16:57:33 +0000"  >&lt;p&gt;Is it still planned for 3.4?&lt;/p&gt;</comment>
                            <comment id="1125897" author="scotthernandez" created="Mon, 4 Jan 2016 23:59:48 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=liranms&quot; class=&quot;user-hover&quot; rel=&quot;liranms&quot;&gt;liranms&lt;/a&gt;, thanks for the pull request. I&apos;ve added some comment there. Let&apos;s work on that until we have a plan, and then come back to jira for the next steps.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dynamike&quot; class=&quot;user-hover&quot; rel=&quot;dynamike&quot;&gt;dynamike&lt;/a&gt;, This has slipped from 3.2 as expected but we are working hard on getting this into the 3.4 release &amp;#8211; to replace both the cloner and data (delta = oplog) replication process. We will have more time to discuss and understand the upstream consequences of increasing replication concurrency and how it will affect end users.&lt;/p&gt;

&lt;p&gt;The current plan is to support parallel copying at the collection level so we can support databases with a lot of collections or a small number of collections in a lot of databases. There may also be support for resuming the cloning process if the initial sync is stopped (like due to a system shutdown), so we can only clone the missing collections.&lt;/p&gt;</comment>
                            <comment id="1122063" author="liranms" created="Thu, 24 Dec 2015 14:18:02 +0000"  >&lt;p&gt;It is extremely important to support for parallel cloning, especially during the index build phase.&lt;br/&gt;
As it happens, our MongoDB replicas sit on some powerful machines and its a waste using a single core, especially when using engines like wiredTiger and RocksDB which use compression.&lt;br/&gt;
It wouldn&apos;t be as serious had it not prevented us from adding new replicas with different engines, as they take alot of time to build the index (in the order of days), which causes them to loose sync, even after enlarging the oplog window.&lt;/p&gt;

&lt;p&gt;Regarding the DOS attack that @scotthernandez mentioned, it&apos;s less relevant for the index building stage (which happens on the node itself) so paralleling this stage would not harm.&lt;/p&gt;</comment>
                            <comment id="944576" author="dynamike" created="Thu, 18 Jun 2015 18:23:41 +0000"  >&lt;p&gt;Totally agree to keep the default initial sync rate heavily limited and having the ability to dynamically tune it the correct way to do it. Looking forward to the new Data Replicator stuff.  &lt;/p&gt;</comment>
                            <comment id="944565" author="scotthernandez" created="Thu, 18 Jun 2015 18:18:37 +0000"  >&lt;p&gt;It is not planned for 3.2 at this time.&lt;/p&gt;

&lt;p&gt;There are too many open questions about performance and the load it would create upstream on the sync source, to schedule it now. In addition there are currently no configurable options for initial sync, and without adaptive load monitoring/control, one would probably want to control the concurrency of how many collections are cloned at once at a minimum. We don&apos;t want to introduce a feature that can DOS attack other members in the replica set during initial sync &amp;#8211; some people have actually seen problems with the current initial sync process causing performance degradation on live systems since it can&apos;t be throttled.&lt;/p&gt;

&lt;p&gt;The good news is that the new Data Replicator components, we will soon have internally, will allow us to support concurrent clones relatively easily when it is time.&lt;/p&gt;</comment>
                            <comment id="944505" author="dynamike" created="Thu, 18 Jun 2015 17:49:50 +0000"  >&lt;p&gt;Is this planned for 3.2? &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="284790">SERVER-24069</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="211401">SERVER-19022</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="56514">SERVER-7680</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="245605">SERVER-22061</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54971">SERVER-7527</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25128"><![CDATA[Replication]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[500A000000YQvxMIAT, 500A000000cCoCyIAK, 5002K00000dY78vQAC, 5002K00000nq6GOQAY, 5002K00000tTv8ZQAS]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 18 Jun 2015 17:49:50 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        6 years, 48 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>opal.hoyt@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            6 years, 48 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-repl</customfieldvalue>
            <customfieldvalue>crystal.horn@mongodb.com</customfieldvalue>
            <customfieldvalue>liranms</customfieldvalue>
            <customfieldvalue>dynamike</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
            <customfieldvalue>royrez@microsoft.com</customfieldvalue>
            <customfieldvalue>scotthernandez</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrl8af:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrfp7r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrnndz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>