<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 02:58:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-2065] Sharding on arrays</title>
                <link>https://jira.mongodb.org/browse/SERVER-2065</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Consider the following:&lt;/p&gt;

&lt;p&gt;conversation&lt;br/&gt;
{&lt;br/&gt;
  users: &lt;span class=&quot;error&quot;&gt;&amp;#91;id1, id1, id3, ...&amp;#93;&lt;/span&gt;,&lt;br/&gt;
  ...other conversation data...&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;Displaying a list of a single user&apos;s conversations with no further restrictions is impossible without querying every shard. Even restructuring the collection doesn&apos;t fix the problem.&lt;/p&gt;

&lt;p&gt;This can be worked around at the application level by creating and maintaining a separate collection with one entry per user, but that isn&apos;t very elegant. If the user needs to be able to arbitrarily filter their list of conversations, this gets worse, because most or all of the data needs to be available at query time, and therefore a large amount of data needs to be duplicated per user.&lt;/p&gt;

&lt;p&gt;When duplicating at the code level, it is necessary to create one duplicate per entry, regardless of whether they actually get placed on different shards, because they MAY get placed on different shards, or may get rebalanced to another shard later. Sharding based on array contents would still require duplication sometimes, but it could be greatly reduced, and may not require any duplication at all if all entries in the array resolve to the same shard.&lt;/p&gt;

&lt;p&gt;The logical implementation for this is actually fairly straightforward:&lt;/p&gt;

&lt;p&gt;Insert:&lt;br/&gt;
1. look at the elements in the array and determine which shards are within range of any of the elements&lt;br/&gt;
2. Insert the record on each shard&lt;/p&gt;

&lt;p&gt;Update:&lt;br/&gt;
1. look up the complete sharded array from any copy of the record using the provided shard key&lt;br/&gt;
2. If the sharded array is being modified, determine whether the list of shards it resides on will change, and remove from or insert to those shards as needed.&lt;br/&gt;
3. Update the record on all shards&lt;/p&gt;

&lt;p&gt;Delete:&lt;br/&gt;
1. look up the complete sharded array from any copy of the record using the provided shard key&lt;br/&gt;
2. remove from all shards it resides on&lt;/p&gt;

&lt;p&gt;I don&apos;t know for sure whether or not this would complicate re-balancing, but I don&apos;t think so. Unless I&apos;ve missed something you SHOULD be able to treat each value as effectively distinct for this. When you split the chunk, just split the records as needed. The catch here is that the actual gains may be somewhat unpredictable, especially when the split was inspired by high disk space use. In any case though, it couldn&apos;t be any &lt;em&gt;worse&lt;/em&gt; than having to duplicate everything all the time, even when it isn&apos;t needed.&lt;/p&gt;</description>
                <environment></environment>
        <key id="13629">SERVER-2065</key>
            <summary>Sharding on arrays</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="bugslayer">John Crenshaw</reporter>
                        <labels>
                            <label>tommaso-triage</label>
                    </labels>
                <created>Fri, 5 Nov 2010 09:01:24 +0000</created>
                <updated>Tue, 6 Dec 2022 05:47:06 +0000</updated>
                            <resolved>Thu, 2 Dec 2021 13:36:11 +0000</resolved>
                                                    <fixVersion>features we&amp;#39;re not sure of</fixVersion>
                                    <component>Sharding</component>
                                        <votes>2</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="83038" author="eliot" created="Sat, 28 Jan 2012 04:33:43 +0000"  >&lt;p&gt;@julian - you would never get more than 1 copy of a document &lt;/p&gt;</comment>
                            <comment id="82730" author="jdcoder36" created="Fri, 27 Jan 2012 04:29:37 +0000"  >&lt;p&gt;How would this affect find()? Would more than one result be returned when there is a match on more than one element in the array ($in)?&lt;/p&gt;

&lt;p&gt;Currently I duplicate entires for documents that are &apos;imitating&apos; sharding on an array, but I have to identify duplicates when searching the collection and remove them at the &apos;code level&apos; (at least until distinct() is a cursor ).&lt;/p&gt;

&lt;p&gt;Even if the gains in disk space are marginal this could be a good idea to simplify searching as well.&lt;/p&gt;</comment>
                            <comment id="48878" author="redbeard0531" created="Tue, 16 Aug 2011 17:46:16 +0000"  >&lt;p&gt;If we decide to support this in the future will need to undo restrictions in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-3586&quot; title=&quot;Array as shard key value should be prohibited.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-3586&quot;&gt;&lt;del&gt;SERVER-3586&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="20924">SERVER-3586</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 16 Aug 2011 17:46:16 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        12 years, 3 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            12 years, 3 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>bugslayer</customfieldvalue>
            <customfieldvalue>jdcoder36</customfieldvalue>
            <customfieldvalue>mathias@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrpbpr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrg0cn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6603</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsvsfz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>