<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:42:02 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-16715] Distribution of data with hashed shard key suddenly biased toward few shards</title>
                <link>https://jira.mongodb.org/browse/SERVER-16715</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;On 2.8.0-rc4, both mmapv1 and wiredTiger, I&apos;ve observed a peculiar biasing of chunks toward to a seemingly random 1 or 2 shards out of the 10 total for new sharded collections.&lt;/p&gt;

&lt;p&gt;The workload of the application is single-threaded and roughly as follows:&lt;br/&gt;
1.) programmatically creates a sharded database and a sharded collection&lt;br/&gt;
2.) creates the {&lt;tt&gt;_id:&quot;hashed&quot;&lt;/tt&gt;} index and a {&lt;tt&gt;_id:1&lt;/tt&gt;} index.&lt;br/&gt;
3.) inserts ~220k documents, each about ~2kB in size, with a string _id.&lt;br/&gt;
4.) repeats from step #1, flipping back and forth between two databases, but always on a new sharded collection. Meaning when the workload completes, there&apos;s two sharded databases, each with 24 sharded collections on {&lt;tt&gt;_id:&quot;hashed&quot;&lt;/tt&gt;}, each collection containing ~220k documents.&lt;/p&gt;

&lt;p&gt;Initially my the workload application starts out distributing chunks across all shards evenly as expected for each new sharded collection. However at some indeterminate point, when a new collection is created and sharded, it&apos;s as though 1 or 2 shards suddenly become &quot;sinks&quot; for a skewed majority (~80%) of all inserts. The other shards &lt;em&gt;do&lt;/em&gt; receive some of the writes/chunks for the collection, but most are biased toward these 1 or 2 &quot;select&quot; shards.&lt;/p&gt;

&lt;p&gt;After the workload completes, the balancer does eventually redistribute all chunks evenly.&lt;/p&gt;

&lt;p&gt;I&apos;ve had difficulty reproducing with a simpler setup, so I&apos;m attaching some logs for the wiredTiger run where exactly 1 shard (&quot;old_8&quot;) was the biased shard:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;mongos.log.2015-01-03T21-22-44 - single mongos the workload was talking to&lt;/li&gt;
	&lt;li&gt;mongodb.primary-recipient.log.2015-01-03T21-22-31 - the primary of the suddenly &quot;hot&quot; biased shard &quot;old_8&quot;&lt;/li&gt;
	&lt;li&gt;mongodb.first-configsvr.log.2015-01-03T21-22-34 - first config server in the list of 3.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Timing of logs:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;on Jan 3rd about 18:06 UTC the run beings&lt;/li&gt;
	&lt;li&gt;on Jan 3rd about 19:55 UTC the shard old_8 primary &quot;JM-c3x-wt-12.rrd-be.54a0a5b5e4b068b9df8d7b9c.mongodbdns.com:27001&quot; begins receiving the lion&apos;s share of writes. (opcounter screenshot from MMS attached, as well as MMS chunks chart for an example biased collection.)&lt;br/&gt;
&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/60796/60796_Image+2015-01-04+at+3.57.31+PM.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/li&gt;
	&lt;li&gt;in general, it seems like collections after &quot;benchmarks2015010500&quot; from both databases are when the issue starts. E.g., &quot;benchmark2015010500&quot;, &quot;benchmark2015010501&quot;, &quot;benchmark2015010502&quot;, etc&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Environment:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;EC2 east, Ubuntu 14.04 c3.xlarge&lt;/li&gt;
	&lt;li&gt;All nodes on same VPC subnet&lt;/li&gt;
	&lt;li&gt;2.8.0-rc4, 1GB oplog, journaling enabled&lt;/li&gt;
	&lt;li&gt;Logs attached are for wiredTiger, but have observed on both engines.&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="176842">SERVER-16715</key>
            <summary>Distribution of data with hashed shard key suddenly biased toward few shards</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="siyuan.zhou@mongodb.com">Siyuan Zhou</assignee>
                                    <reporter username="john.morales@mongodb.com">John Morales</reporter>
                        <labels>
                    </labels>
                <created>Sun, 4 Jan 2015 21:06:22 +0000</created>
                <updated>Sat, 24 Jan 2015 17:18:18 +0000</updated>
                            <resolved>Wed, 21 Jan 2015 18:20:40 +0000</resolved>
                                    <version>2.8.0-rc4</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>11</watches>
                                                                                                                <comments>
                            <comment id="796626" author="john.morales@10gen.com" created="Sun, 4 Jan 2015 21:39:33 +0000"  >&lt;p&gt;Also attaching a copy/paste from my console of the &lt;tt&gt;sh.status()&lt;/tt&gt; output near the time the workload in question was running. Figuring might be useful - many messages under &quot;Migration Results for the last 24 hours:&quot;. Also shows a snapshot of how the chunks were skewed toward &quot;old_8&quot; (before the balancer has since rebalanced the cluster).&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="71153">SERVER-9287</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="84629">SERVER-10430</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="179978">SERVER-16969</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="60796" name="Image 2015-01-04 at 3.57.31 PM.png" size="26987" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:28:26 +0000"/>
                            <attachment id="60795" name="Screen Shot 2015-01-04 at 4.02.11 PM.png" size="50484" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:28:26 +0000"/>
                            <attachment id="60792" name="mongodb.first-configsvr.log.2015-01-03T21-22-34.gz" size="2036180" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:28:26 +0000"/>
                            <attachment id="60793" name="mongodb.primary-recipient.log.2015-01-03T21-22-31.gz" size="1676180" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:28:26 +0000"/>
                            <attachment id="60794" name="mongos.log.2015-01-03T21-22-44.gz" size="1340670" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:28:26 +0000"/>
                            <attachment id="60797" name="mongos.sh.status.out" size="55789" author="john.morales@mongodb.com" created="Sun, 4 Jan 2015 21:39:33 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 5 Jan 2015 23:40:22 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        9 years, 6 weeks, 3 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-9287'>SERVER-9287</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>false</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            9 years, 6 weeks, 3 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>john.morales@mongodb.com</customfieldvalue>
            <customfieldvalue>siyuan.zhou@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlfo7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hs55pj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>155102</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;I&apos;ve repeated the behavior consistently (once each on mmapv1 and wiredTiger) but &lt;em&gt;only&lt;/em&gt; on a rather complicated setup of a 10 shard cluster deployed on EC2 using MMS Automation. And unfortunately, attempts with a simpler workload generator on a locally deployed cluster on OS X have been &lt;em&gt;unable&lt;/em&gt; to repro.&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsbz5b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>