<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:48:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-38205] Optimize splitVector for the jumbo-chunk case</title>
                <link>https://jira.mongodb.org/browse/SERVER-38205</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;If a chunk only contains a single shard key (or very few shard keys), it will be marked as jumbo and not be moveable by the balancer. However, the autosplitter will continue to try to split this chunk periodically, even if there&apos;s only a single unique key, which would mean that it could never be split. There are several ways we could optimize for this case:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;In splitVector, we can do a lookup at the min key and a backward lookup at the max key, and if the key prior to the max key is the same as the min key, then we know the entire chunk consists of a unique key and we can skip having to scan the chunk.&lt;/li&gt;
	&lt;li&gt;In splitVector, while scanning, if we decide that a key &lt;em&gt;X&lt;/em&gt; should be a split key, we can skip to the next unique key rather than scanning through the rest of the documents for &lt;em&gt;X&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;
</description>
                <environment></environment>
        <key id="636104">SERVER-38205</key>
            <summary>Optimize splitVector for the jumbo-chunk case</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13201">Fixed</resolution>
                                        <assignee username="kevin.pulo@mongodb.com">Kevin Pulo</assignee>
                                    <reporter username="matthew.saltz@mongodb.com">Matthew Saltz</reporter>
                        <labels>
                    </labels>
                <created>Mon, 19 Nov 2018 16:59:21 +0000</created>
                <updated>Sun, 29 Oct 2023 22:26:26 +0000</updated>
                            <resolved>Fri, 8 Feb 2019 01:22:53 +0000</resolved>
                                                    <fixVersion>3.6.15</fixVersion>
                    <fixVersion>4.0.7</fixVersion>
                    <fixVersion>4.1.8</fixVersion>
                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>10</watches>
                                                                                                                <comments>
                            <comment id="2422298" author="xgen-internal-githook" created="Mon, 16 Sep 2019 22:35:09 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Kevin Pulo&apos;, &apos;username&apos;: &apos;devkev&apos;, &apos;email&apos;: &apos;kevin.pulo@mongodb.com&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-38205&quot; title=&quot;Optimize splitVector for the jumbo-chunk case&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-38205&quot;&gt;&lt;del&gt;SERVER-38205&lt;/del&gt;&lt;/a&gt; avoid splitVector scan when range contains single key&lt;/p&gt;

&lt;p&gt;(cherry picked from commit 4da738debb1aea49524ff8e364254afb5bfda612)&lt;br/&gt;
Branch: v3.6&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/f1d805f6ee74fd3399bcd9a170281c93a7a44405&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/f1d805f6ee74fd3399bcd9a170281c93a7a44405&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2143762" author="xgen-internal-githook" created="Sun, 10 Feb 2019 23:54:47 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Kevin Pulo&apos;, &apos;email&apos;: &apos;kevin.pulo@mongodb.com&apos;, &apos;username&apos;: &apos;devkev&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-38205&quot; title=&quot;Optimize splitVector for the jumbo-chunk case&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-38205&quot;&gt;&lt;del&gt;SERVER-38205&lt;/del&gt;&lt;/a&gt; avoid splitVector scan when range contains single key&lt;/p&gt;

&lt;p&gt;(cherry picked from commit 4da738debb1aea49524ff8e364254afb5bfda612)&lt;br/&gt;
Branch: v4.0&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/a79ff87dbdffd86e84da2703a256d56499d72cd2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/a79ff87dbdffd86e84da2703a256d56499d72cd2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2141986" author="xgen-internal-githook" created="Fri, 8 Feb 2019 01:18:47 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Kevin Pulo&apos;, &apos;email&apos;: &apos;kevin.pulo@mongodb.com&apos;, &apos;username&apos;: &apos;devkev&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-38205&quot; title=&quot;Optimize splitVector for the jumbo-chunk case&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-38205&quot;&gt;&lt;del&gt;SERVER-38205&lt;/del&gt;&lt;/a&gt; avoid splitVector scan when range contains single key&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/4da738debb1aea49524ff8e364254afb5bfda612&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/4da738debb1aea49524ff8e364254afb5bfda612&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2075815" author="matthew.saltz" created="Thu, 29 Nov 2018 18:51:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=david.storch&quot; class=&quot;user-hover&quot; rel=&quot;david.storch&quot;&gt;david.storch&lt;/a&gt; Your (1) is indeed the idea for implementing solution 1 in the ticket. I think another option to implement option 2 would be, instead of seeking to the next key, open a new InternalPlanner with a new query for key &amp;gt; currentKeyBeingLookedAt. &lt;/p&gt;

&lt;p&gt;If that would work I think doing solution 2 would be best and most general purpose, since like Kal said the first is limited if we have a situation with a high cardinality key in the middle of a chunk.&lt;/p&gt;</comment>
                            <comment id="2073571" author="david.storch" created="Tue, 27 Nov 2018 22:59:28 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt;, the ability to perform arbitrary inclusive or exclusive index seeks is definitely supported by the storage subsystem&apos;s &lt;tt&gt;SortedDataInterface&lt;/tt&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/6efa4ed0820b6f6e3a2615dc5f42e13ce3415ad8/src/mongo/db/storage/sorted_data_interface.h#L265-L301&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/blob/6efa4ed0820b6f6e3a2615dc5f42e13ce3415ad8/src/mongo/db/storage/sorted_data_interface.h#L265-L301&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We take advantage of this to skip keys in the query layer, in particular for the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/distinct_scan.h&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;DISTINCT_SCAN stage&lt;/a&gt;. However, I don&apos;t think this is exposed by the &lt;tt&gt;InternalPlanner&lt;/tt&gt; in a way that &lt;tt&gt;splitVector&lt;/tt&gt; could use. You could either&lt;/p&gt;

&lt;p&gt;1) Use the &lt;tt&gt;InternalPlanner&lt;/tt&gt; to seek forwards (inclusive) from the min, limited to one key. Then use the &lt;tt&gt;InternalPlanner&lt;/tt&gt; again to seek backwards (exclusive) from the max, also limited to one key. I believe this is how you would implement solution 1 above?&lt;/p&gt;

&lt;p&gt;2) Circumvent the query layer and use the storage interface directly. One problem with this approach is that you would lose the code responsible for yielding and &lt;tt&gt;WriteConflictException&lt;/tt&gt; handling.&lt;/p&gt;</comment>
                            <comment id="2068584" author="kevin.pulo@10gen.com" created="Wed, 21 Nov 2018 01:02:25 +0000"  >&lt;p&gt;There&apos;s also a &quot;0&quot; optimisation, which is that splitVector() should &lt;a href=&quot;https://github.com/mongodb/mongo/blob/r4.1.5/src/mongo/db/s/split_vector.cpp#L107&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;short-circuit here if minKey == maxKey&lt;/a&gt;.  Currently in this situation it will still scan the shard key index entries for this shard key value (pointlessly, because there&apos;s no hope of finding any split points) (and I can&apos;t see any higher-level code which prevents splitVector() from being called with min == max).&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;if a high-cardinality key is in the middle of a chunk, it is possible that the low-cardinality keys that precede it may never get split away&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This is true, but it doesn&apos;t preclude the optimisations &amp;#8212; if 0 or 1 were implemented then it would at least allow the workaround of manually splitting immediately around the (manually identified) low-cardinality (high duplication) shard key value.&lt;/p&gt;

&lt;p&gt;Future auto-splitter work could try to paint a more detailed picture of the statistical distribution of shard key values encountered during splitVector of a chunk, and so better handle situations like this.&lt;/p&gt;</comment>
                            <comment id="2067632" author="kaloian.manassiev" created="Tue, 20 Nov 2018 13:06:08 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=matthew.saltz&quot; class=&quot;user-hover&quot; rel=&quot;matthew.saltz&quot;&gt;matthew.saltz&lt;/a&gt;, in addition as we spoke yesterday, we realized that there is a bug with the chunk splitter where if a high-cardinality key is in the middle of a chunk, it is possible that the low-cardinality keys that precede it may never get split away. Having situation like this would preclude optimization #1, wouldn&apos;t it?&lt;/p&gt;</comment>
                            <comment id="2067631" author="kaloian.manassiev" created="Tue, 20 Nov 2018 13:04:20 +0000"  >&lt;p&gt;Solution 1 sounds like like it should be fairly easy to implement without adding undue load to the &lt;tt&gt;splitVector&lt;/tt&gt; command, so that would be my preference.&lt;/p&gt;

&lt;p&gt;For 2, I am not sure whether the query stage exposes such capability to &quot;jump over&quot; to the next unique key. &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=david.storch&quot; class=&quot;user-hover&quot; rel=&quot;david.storch&quot;&gt;david.storch&lt;/a&gt;, do you know whether this would be possible with &lt;a href=&quot;https://github.com/mongodb/mongo/blob/ac2880e51b9b540a24f9babc632a89c23d0b51b4/src/mongo/db/s/split_vector.cpp#L165&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this usage&lt;/a&gt; of the IndexScan executor?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10520">
                    <name>Problem/Incident</name>
                                            <outwardlinks description="causes">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="15640"><![CDATA[v4.0]]></customfieldvalue>
    <customfieldvalue key="15141"><![CDATA[v3.6]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000fNIPdQAO]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 20 Nov 2018 13:04:20 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 21 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 21 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>kevin.pulo@mongodb.com</customfieldvalue>
            <customfieldvalue>matthew.saltz@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hud9mf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr8l8n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="2640">Sharding 2018-12-31</customfieldvalue>
    <customfieldvalue id="2725">Sharding 2019-01-14</customfieldvalue>
    <customfieldvalue id="2726">Sharding 2019-01-28</customfieldvalue>
    <customfieldvalue id="2786">Sharding 2019-02-11</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11861" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>User Summary</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="11856"><![CDATA[Not Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hucvvr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>