<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:59:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-42237] Allow shard key to be prefix of multikey index </title>
                <link>https://jira.mongodb.org/browse/SERVER-42237</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Currently is not possible to use as shard key a prefix of a multikey index.&lt;/p&gt;

&lt;p&gt;For this reason, even if we have as shard key {a:1} and we have already the index {a:1, b:1} that is considered multikey ONLY because the field &lt;tt&gt;b&lt;/tt&gt; is an array, we must have an index on {a:1} or a non multikey index where &lt;tt&gt;a&lt;/tt&gt; is a prefix.&lt;/p&gt;

&lt;p&gt;Allowing to use {a:1, b:1} for sharding will allow to save RAM, space and write overhead.&lt;/p&gt;</description>
                <environment></environment>
        <key id="858297">SERVER-42237</key>
            <summary>Allow shard key to be prefix of multikey index </summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="backlog-server-sharding-emea">[DO NOT USE] Backlog - Sharding EMEA</assignee>
                                    <reporter username="renato.riccio@mongodb.com">Renato Riccio</reporter>
                        <labels>
                    </labels>
                <created>Mon, 15 Jul 2019 17:03:35 +0000</created>
                <updated>Tue, 6 Dec 2022 02:53:59 +0000</updated>
                            <resolved>Tue, 28 Jun 2022 06:42:39 +0000</resolved>
                                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="4643866" author="pierlauro.sciarelli" created="Tue, 28 Jun 2022 06:42:39 +0000"  >&lt;p&gt;Thanks to &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn@mongodb.com&quot;&gt;max.hirschhorn@mongodb.com&lt;/a&gt; for pointing out that the query subsystem is responsible for deduplicate by RecordId when the IndexScan is scanning over a multikey index. The above mentioned bug does not exist, closing the ticket&lt;/p&gt;</comment>
                            <comment id="4599263" author="pierlauro.sciarelli" created="Tue, 7 Jun 2022 12:55:42 +0000"  >&lt;p&gt;Reopening the ticket because it is not allowed to shard a collection with only multikey indexes, however there is this faulty flow:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;&lt;tt&gt;shardCollection&lt;/tt&gt; succeeds because there is a proper index&lt;/li&gt;
	&lt;li&gt;The original shard key index is dropped and only a multi-key index with the shard key as prefix remains available&lt;/li&gt;
	&lt;li&gt;From now on, some sharding machinery (e.g. auto-splitter and migration cloning) starts using the multikey index&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;(check usages of &lt;a href=&quot;https://github.com/mongodb/mongo/search?q=requireSingleKey&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;requireSingleKey&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;cite&gt;A shard key index scan has to skip over these extra index entries, and de-duplicate them to ensure that each relevant document isn&apos;t handled repeatedly&lt;/cite&gt;&lt;/p&gt;

&lt;p&gt;Provided that this observation is still true, that would mean that we have a bug since we are not handling this case in the code.&lt;/p&gt;</comment>
                            <comment id="2409116" author="ratika.gandhi" created="Thu, 5 Sep 2019 15:59:50 +0000"  >&lt;p&gt;Good idea but other higher priority items on the backlog to tackle.&#160;&lt;/p&gt;</comment>
                            <comment id="2356020" author="kaloian.manassiev" created="Fri, 2 Aug 2019 14:41:56 +0000"  >&lt;p&gt;To be discussed at the next eng/product sync meeting.&lt;/p&gt;</comment>
                            <comment id="2349170" author="kevin.pulo@10gen.com" created="Tue, 30 Jul 2019 00:47:27 +0000"  >&lt;p&gt;Historically, this restriction is because in older versions the entire index was marked multikey, and it wasn&apos;t possible to tell if that was because of field a or b.  I believe that is no longer the case in current versions, so this should be possible.&lt;/p&gt;

&lt;p&gt;To expand on this a little, there are two issues here.&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;A prefix index on fields A + B (even when B is non-multikey) will be less efficient than an exact match shard key index on just A, because the index entries include values for the other fields (B), making them bigger.  In extreme cases this can make the index much bigger, and can slow down the shard key index scan.  The index might also have more contention if the B values are updated often.&lt;/li&gt;
	&lt;li&gt;Multikey indexes make this problem worse, because now each document (with a single A value) has multiple &quot;irrelevant&quot; index entries as a result of the array values on the B field.  A shard key index scan has to skip over these extra index entries, and de-duplicate them to ensure that each relevant document isn&apos;t handled repeatedly.  I believe the query subsystem should handle this, but it&apos;s extra work, making the scan slower.  The index will also be bigger as a result of these extra index entries.  The exact match shard key index on just A would not have these extra index entries (in addition to not having any of the B values at all, as above).&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Another aspect is that it&apos;s currently possible to get into this state by sharding with a non-multikey A + B index, which then later becomes multikey on B.  The inverse situation of sharding with an A + (multikey)B index is not possible, despite being equivalent.  Similarly, it&apos;s possible to shard with an A index and A + (multikey)B, and then drop the A index.&lt;/p&gt;

&lt;p&gt;The issue here is that these combinations of factors, mixed in other relevant factors like the rest of the query workload, index/data/storage sizes, etc, means that it may be hard to know the exact set of circumstances where having the A + (multikey)B index would be an overall win, compared to having both the A + (multikey)B and the solo A index (and vice-versa where it would be an overall loss).  Which doesn&apos;t necessarily mean we shouldn&apos;t do it, I just want to make sure we&apos;re aware of the issues here.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="233823">SERVER-20857</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25133"><![CDATA[Sharding EMEA]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 30 Jul 2019 00:47:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 32 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 32 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding-emea</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>kevin.pulo@mongodb.com</customfieldvalue>
            <customfieldvalue>pierlauro.sciarelli@mongodb.com</customfieldvalue>
            <customfieldvalue>ratika.gandhi@mongodb.com</customfieldvalue>
            <customfieldvalue>renato.riccio@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvejyn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hso2j3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="3197">Sharding 2019-08-26</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hve67z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>