<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:45:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-37418] Background index builds should batch collection scan reads and inserts into the index</title>
                <link>https://jira.mongodb.org/browse/SERVER-37418</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;For every single document in a collection, background index builds retrieve a document, call &lt;a href=&quot;https://github.com/mongodb/mongo/blob/175f5e3c25ddba439b7d28254a4af5504aded0d8/src/mongo/db/catalog/index_create_impl.cpp#L398&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;saveState()&lt;/a&gt; (which resets the cursor) and restoreState() (which repositions it) after every single insert.&lt;/p&gt;

&lt;p&gt;It would be more efficient to batch reads on the collection and inserts into the index so the read cursors are reset less often.&lt;/p&gt;

&lt;p&gt;If we want to take advantage of read_once cursors, this will work around having the read the same page into cache when there are multiple documents in each page.&lt;/p&gt;</description>
                <environment></environment>
        <key id="612494">SERVER-37418</key>
            <summary>Background index builds should batch collection scan reads and inserts into the index</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13203">Gone away</resolution>
                                        <assignee username="louis.williams@mongodb.com">Louis Williams</assignee>
                                    <reporter username="louis.williams@mongodb.com">Louis Williams</reporter>
                        <labels>
                            <label>nyc</label>
                    </labels>
                <created>Mon, 1 Oct 2018 22:10:22 +0000</created>
                <updated>Fri, 27 Oct 2023 20:43:14 +0000</updated>
                            <resolved>Thu, 17 Jan 2019 16:48:04 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="2118459" author="louis.williams" created="Thu, 17 Jan 2019 16:48:04 +0000"  >&lt;p&gt;This is no longer necessary as &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-37270&quot; title=&quot;Remove foreground index build functionality&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-37270&quot;&gt;&lt;del&gt;SERVER-37270&lt;/del&gt;&lt;/a&gt; builds all index builds with the external sorter.&lt;/p&gt;</comment>
                            <comment id="2112852" author="louis.williams" created="Fri, 11 Jan 2019 20:27:50 +0000"  >&lt;p&gt;Will be implemented for hybrid builds in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-37270&quot; title=&quot;Remove foreground index build functionality&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-37270&quot;&gt;&lt;del&gt;SERVER-37270&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2047098" author="milkie" created="Wed, 31 Oct 2018 04:38:18 +0000"  >&lt;p&gt;Since this project is going to stop doing index table writes during the collection scan phase (instead, the data will be written to the external sorter), I don&apos;t think there is that much work to be done here.  We can do the batching without worrying about handling write conflict exceptions any differently than we already do today.  However, we cannot do this work until later in the project.&lt;/p&gt;</comment>
                            <comment id="2046552" author="louis.williams" created="Tue, 30 Oct 2018 18:33:22 +0000"  >&lt;p&gt;After some testing locally, removing the save/restore code will speed up background builds by about 20%. Additionally batching scans+inserts into groups of 1000 brings that figure to about 100%. This is with an incomplete implementation.&lt;/p&gt;

&lt;p&gt;To support write conflicts in the middle of batches, the collection scans need to yield and be resumed at the first Record at the beginning of the failed batch. CollectionScans don&apos;t currently support that behavior. &lt;/p&gt;

&lt;p&gt;Currently I see a few solutions:&lt;/p&gt;

&lt;p&gt;1. Buffer all intermediate collection scan results in memory until they are committed. We would also want to expose a &quot;isGoingToYield()&quot; method on the PlanExecutor that hints about an upcoming yield. In this way we can proactively commit an outstanding WriteUnitOfWork before a yield takes place and adhere to existing collection scan yielding rules. Write conflicts would just start inserting from the beginning of the buffer.&lt;br/&gt;
2. Expose a &quot;repositionOnRecord()&quot; operation on the PlanExecutor. I&apos;m not sure how exactly this would work in terms of subclasses other than CollectionScan. This would also require something akin to a &quot;isGoingToYield()&quot; method.&lt;br/&gt;
3. Read using a direct RecordStore cursor and handle all yielding/locking manually. The cursor can easily be repositioned between yields/WCEs. It would likely need to conform to existing yielding rules for collection scans, but this also seems like the most reasonable approach to me.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt; What do you think about these options? Is the performance gain here worth the work?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="608277">SERVER-37270</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="608276">SERVER-37269</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 31 Oct 2018 04:38:18 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 3 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-37270'>SERVER-37270</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-663</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 3 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>louis.williams@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hu9ar3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hu5muv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="2708">Storage NYC 2019-01-14</customfieldvalue>
    <customfieldvalue id="2709">Storage NYC 2019-01-28</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10555" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Story Points</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hu8x0f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>