<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 06:30:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-75647] POC: Index build at end for resharding</title>
                <link>https://jira.mongodb.org/browse/SERVER-75647</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;As part of the faster addShard project, which uses resharding to redistributed the data, we are considering building indexes at the end of the process rather than as we insert the data.  This is a POC to determine if we get significant performance gains from that.&lt;/p&gt;</description>
                <environment></environment>
        <key id="2306207">SERVER-75647</key>
            <summary>POC: Index build at end for resharding</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="matthew.russotto@mongodb.com">Matthew Russotto</assignee>
                                    <reporter username="matthew.russotto@mongodb.com">Matthew Russotto</reporter>
                        <labels>
                    </labels>
                <created>Tue, 4 Apr 2023 13:54:35 +0000</created>
                <updated>Tue, 2 May 2023 14:38:36 +0000</updated>
                            <resolved>Mon, 10 Apr 2023 14:18:21 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="5373663" author="matthew.russotto" created="Tue, 25 Apr 2023 14:14:10 +0000"  >&lt;p&gt;Correct, but note _id was ObjectId and so matched natural order.&lt;/p&gt;</comment>
                            <comment id="5366631" author="geert.bosch" created="Fri, 21 Apr 2023 17:03:38 +0000"  >&lt;p&gt;This was still doing scan+fetch on _id for the cloning phase, right?&lt;/p&gt;</comment>
                            <comment id="5335050" author="matthew.russotto" created="Mon, 10 Apr 2023 14:18:10 +0000"  >&lt;p&gt;The 1G document (~1T data) run completed, barely. Setup was the same as the 100M document run except I changed mongo_ebs_size to 1500 (GB).&lt;/p&gt;

&lt;p&gt;Existing code: 3221m (53 hr 40 min).  The machines were only provisioned for 48 hours so only the luck of imprecision let this complete.  The machines were de-provisioned before I could get full logs.&lt;/p&gt;

&lt;p&gt;Building all indexes except _id at the end: 908m  (15 hr 8 min), on the slower shard 34480 secs (9 hr 34 min 40 sec) spent cloning, 20012 secs (5 hr 33 min 32 sec) spent building indexes.&lt;/p&gt;

&lt;p&gt;Building indexes at the end is dramatically faster; nearly linear, in fact.&lt;/p&gt;</comment>
                            <comment id="5323406" author="matthew.russotto" created="Tue, 4 Apr 2023 14:56:01 +0000"  >&lt;p&gt;Preliminary results with 100M documents generated by &lt;a href=&quot;https://github.com/pkdone/mongo-mangler&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/pkdone/mongo-mangler&lt;/a&gt; using the pipeline_garaudy.js document.  There were 10 secondary indexes plus the original shard index which is a hashed index on _id (_id is an ObjectID), plus a hashed index on  &apos;cardnumber&apos; which was used as the new resharding key. &lt;/p&gt;

&lt;p&gt;The 10 secondary indexes were &lt;/p&gt;
{pre}&lt;br/&gt;
        {&quot;name.last&quot;: 1, &quot;name.first&quot;: 1, &quot;name.middle&quot;:1}&apos;&lt;br/&gt;
        &apos;{&quot;age&quot; : 1}&apos;&lt;br/&gt;
        &apos;{&quot;race&quot; : 1}&apos;&lt;br/&gt;
        &apos;{&quot;marital_status&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;legal_status&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;dependent_count&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;income_category&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;home_ownership&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;employment_status&quot;: 1}&apos;&lt;br/&gt;
        &apos;{&quot;employment_industry&quot;: 1}&apos;{pre}

&lt;p&gt;Configuration was a 2-shard cluster, r6g.2xlarge shards with 3 nodes each, 1 r6g.xlarge mongos and 1 r6g.xlarge config server.&lt;/p&gt;

&lt;p&gt;Existing code (build all indexes while inserting data): 6620 seconds, 6575 spent in cloning&lt;/p&gt;

&lt;p&gt;Build all indexes except _id and new shard index at end: 5103 seconds, 3383 spent cloning, 1718 spent building indexes&lt;/p&gt;

&lt;p&gt;Build all indexes except _id at end: 4820 seconds, 2891 spent cloning, 1927 spent building indexes&lt;/p&gt;

&lt;p&gt;Looks like this is definitely worth it; even having the one extra index at insert time makes a difference.  These documents were about 1K apiece so 100M documents is very roughly 100G; I&apos;ll do a run with 1G documents (1T data) and the existing and &quot;all indexes&quot; cases. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="2284971">SERVER-74722</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="441070" name="pipeline_garaudy.js" size="4633" author="matthew.russotto@mongodb.com" created="Tue, 4 Apr 2023 14:57:59 +0000"/>
                            <attachment id="441840" name="runtest" size="1489" author="matthew.russotto@mongodb.com" created="Fri, 7 Apr 2023 17:56:00 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 21 Apr 2023 17:03:38 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        41 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-2322</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>randolph@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            41 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>geert.bosch@mongodb.com</customfieldvalue>
            <customfieldvalue>matthew.russotto@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i23993:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i1llgw:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="7169">Repl 2023-04-17</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i22vef:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>