<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:16:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-7960] Chunk size different on shards of same MongoDB cluster</title>
                <link>https://jira.mongodb.org/browse/SERVER-7960</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We have setup a 6 Shard MongoDB cluster with a replication factor of 3.&lt;br/&gt;
When starting the router process, default chunk size and oplog size was chosen by not specifying the values for these explicitly.&lt;/p&gt;

&lt;p&gt;Shard3 has a chunk size of 161 MB while the rest have 60-90 MB per chunk.&lt;br/&gt;
All shards are similar type of instances on Amazon EC2 environment.&lt;br/&gt;
What we have noticed using db.&amp;lt;collection&amp;gt;.getShardDistribution() command is as follows:&lt;/p&gt;

&lt;p&gt;Shard shard1 at shard1/&amp;lt;ips of shard1&amp;gt;&lt;br/&gt;
 data : 38.8Gb docs : 43049426 chunks : 621&lt;br/&gt;
 estimated data per chunk : 63.99Mb&lt;br/&gt;
 estimated docs per chunk : 69322&lt;/p&gt;

&lt;p&gt;Shard shard2 at shard2/&amp;lt;ips of shard2&amp;gt;&lt;br/&gt;
 data : 40.24Gb docs : 44644092 chunks : 620&lt;br/&gt;
 estimated data per chunk : 66.47Mb&lt;br/&gt;
 estimated docs per chunk : 72006&lt;/p&gt;

&lt;p&gt;Shard shard3 at shard3/&amp;lt;ips of shard3&amp;gt;&lt;br/&gt;
 data : 102.65Gb docs : 113874252 chunks : 649&lt;br/&gt;
 estimated data per chunk : 161.97Mb&lt;br/&gt;
 estimated docs per chunk : 175461&lt;/p&gt;

&lt;p&gt;Shard shard4 at shard4/&amp;lt;ips of shard4&amp;gt;&lt;br/&gt;
 data : 54.51Gb docs : 60472368 chunks : 620&lt;br/&gt;
 estimated data per chunk : 90.04Mb&lt;br/&gt;
 estimated docs per chunk : 97536&lt;/p&gt;

&lt;p&gt;Shard shard5 at shard5/&amp;lt;ips of shard5&amp;gt;&lt;br/&gt;
 data : 50.48Gb docs : 56005174 chunks : 620&lt;br/&gt;
 estimated data per chunk : 83.38Mb&lt;br/&gt;
 estimated docs per chunk : 90330&lt;/p&gt;

&lt;p&gt;Shard shard6 at shard6/&amp;lt;ips of shard6&amp;gt;&lt;br/&gt;
 data : 46.32Gb docs : 51388397 chunks : 620&lt;br/&gt;
 estimated data per chunk : 76.51Mb&lt;br/&gt;
 estimated docs per chunk : 82884&lt;/p&gt;

&lt;p&gt;Totals&lt;br/&gt;
 data : 333.05Gb docs : 369433709 chunks : 3750&lt;br/&gt;
 Shard shard1 contains 11.65% data, 11.65% docs in cluster, avg obj size on shard : 967b&lt;br/&gt;
 Shard shard2 contains 12.08% data, 12.08% docs in cluster, avg obj size on shard : 967b&lt;br/&gt;
 Shard shard3 contains 30.82% data, 30.82% docs in cluster, avg obj size on shard : 967b&lt;br/&gt;
 Shard shard4 contains 16.36% data, 16.36% docs in cluster, avg obj size on shard : 967b&lt;br/&gt;
 Shard shard5 contains 15.15% data, 15.15% docs in cluster, avg obj size on shard : 967b&lt;br/&gt;
 Shard shard6 contains 13.91% data, 13.91% docs in cluster, avg obj size on shard : 967b&lt;/p&gt;</description>
                <environment>Linux</environment>
        <key id="59639">SERVER-7960</key>
            <summary>Chunk size different on shards of same MongoDB cluster</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.mongodb.org/images/icons/priorities/critical.svg">Critical - P2</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="stephen.steneker@mongodb.com">Stennie Steneker</assignee>
                                    <reporter username="krisant007">Santosh Kumar L</reporter>
                        <labels>
                            <label>Router</label>
                            <label>chunkSize</label>
                            <label>chunks</label>
                            <label>mongos</label>
                            <label>sharding</label>
                    </labels>
                <created>Mon, 17 Dec 2012 12:34:40 +0000</created>
                <updated>Fri, 8 Mar 2013 15:56:11 +0000</updated>
                            <resolved>Sun, 23 Dec 2012 21:17:00 +0000</resolved>
                                    <version>2.2.1</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="222938" author="stennie" created="Sun, 23 Dec 2012 21:17:00 +0000"  >&lt;p&gt;Hi Santosh,&lt;/p&gt;

&lt;p&gt;Without the data available, we cannot help troubleshoot this.  My suspicion is that the shard key chosen did not provide sufficient uniqueness and so some &quot;jumbo chunks&quot; were created.  Jumbo chunks cannot be split, so will continue to grow; they will also not be migrated by the balancer.&lt;/p&gt;

&lt;p&gt;The Uniform distribution for YCSB also only appears to be in relation to generating load, not uniqueness of keys for sharding:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Uniform: Choose an item uniformly at random. For example, when choosing a record, all records in the database are equally likely to be chosen.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;If you re-run this benchmark in future and still see the same issue, I would suggest starting a discussion on the mongodb-users group: &lt;a href=&quot;http://groups.google.com/group/mongodb-user&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://groups.google.com/group/mongodb-user&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Others testing YCSB with sharding will be able to share their feedback, and if there turns out to be a server issue we would then raise a Jira for it.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Stephen&lt;/p&gt;</comment>
                            <comment id="221161" author="krisant007" created="Fri, 21 Dec 2012 09:06:25 +0000"  >&lt;p&gt;We have terminated the cluster so I cannot provide you the output of db.chunks.find(&lt;/p&gt;
{jumbo:true}
&lt;p&gt;) command.&lt;br/&gt;
All the records inserted in the database are of 1 KB size each and &quot;_id&quot; field is the shard key being used. With uniform distribution of YCSB, all the records will have unique value of &quot;_id&quot; value.&lt;br/&gt;
So I want to understand why mongodb is unable to have chunks of 64MB on all the shards???&lt;/p&gt;</comment>
                            <comment id="221102" author="stennie" created="Fri, 21 Dec 2012 07:13:06 +0000"  >&lt;p&gt;Hi Santosh,&lt;/p&gt;

&lt;p&gt;It looks like your chunks are balanced except on shard3.  I expect the size of the chunks is related to your choice of shard key. Perhaps you have some &quot;jumbo chunks&quot; which cannot be split or migrated on shard3.&lt;/p&gt;

&lt;p&gt;Can you include the output of:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;use config;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;db.chunks.find({jumbo:true})&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Stephen&lt;/p&gt;</comment>
                            <comment id="221088" author="krisant007" created="Fri, 21 Dec 2012 06:52:16 +0000"  >
&lt;p&gt;PFB sh.status() output. &lt;/p&gt;


&lt;p&gt;mongos&amp;gt; sh.status()&lt;br/&gt;
&amp;#8212; Sharding Status &amp;#8212; &lt;br/&gt;
  sharding version: &lt;/p&gt;
{ &quot;_id&quot; : 1, &quot;version&quot; : 3 }
&lt;p&gt;  shards:&lt;/p&gt;
	{  &quot;_id&quot; : &quot;shard1&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }
	{  &quot;_id&quot; : &quot;shard2&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }
	{  &quot;_id&quot; : &quot;shard3&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }
	{  &quot;_id&quot; : &quot;shard4&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }
	{  &quot;_id&quot; : &quot;shard5&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }
	{  &quot;_id&quot; : &quot;shard6&quot;,  &quot;host&quot; : &quot;&amp;lt;ip:port&amp;gt;&quot; }

&lt;p&gt;  databases:&lt;/p&gt;
	{  &quot;_id&quot; : &quot;admin&quot;,  &quot;partitioned&quot; : false,  &quot;primary&quot; : &quot;config&quot; }
	{  &quot;_id&quot; : &quot;test&quot;,  &quot;partitioned&quot; : true,  &quot;primary&quot; : &quot;shard3&quot; }
&lt;p&gt;		test.usertable chunks:&lt;br/&gt;
				shard1	621&lt;br/&gt;
				shard2	620&lt;br/&gt;
				shard3	649&lt;br/&gt;
				shard4	620&lt;br/&gt;
				shard5	620&lt;br/&gt;
				shard6	620&lt;br/&gt;
			too many chunks to print, use verbose if you want to force print&lt;/p&gt;</comment>
                            <comment id="221083" author="krisant007" created="Fri, 21 Dec 2012 06:44:27 +0000"  >&lt;p&gt;We are using YCSB jar for inserting data into MongoDB cluster.&lt;br/&gt;
YCSB inserts serially into the DB. So these are normal inserts. Not bulk inserts.&lt;br/&gt;
YCSB jar file requires the following data as input parameters.&lt;/p&gt;

&lt;p&gt;1.mongos router ip:port&lt;br/&gt;
2.database name in MongoDB cluster&lt;br/&gt;
3.No of records to be inserted&lt;br/&gt;
4.Distribution type to be used for inserting data.&lt;/p&gt;



&lt;p&gt;regards,&lt;br/&gt;
Santosh&lt;/p&gt;</comment>
                            <comment id="221064" author="eliot" created="Fri, 21 Dec 2012 06:06:34 +0000"  >&lt;p&gt;Can you send the exact way you are running YCSB?&lt;/p&gt;</comment>
                            <comment id="221063" author="stennie" created="Fri, 21 Dec 2012 06:05:30 +0000"  >&lt;p&gt;Hi Santosh,&lt;/p&gt;

&lt;p&gt;Can you post the info from &lt;tt&gt;sh.status()&lt;/tt&gt;?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Stephen&lt;/p&gt;</comment>
                            <comment id="221059" author="krisant007" created="Fri, 21 Dec 2012 05:59:21 +0000"  >&lt;p&gt;I&apos;m inserting data using YCSB framework.&lt;br/&gt;
These are inserts as I&apos;m trying to load the data into the MongoDB cluster.&lt;/p&gt;</comment>
                            <comment id="221055" author="eliot" created="Fri, 21 Dec 2012 05:56:32 +0000"  >&lt;p&gt;How are you inserting data?&lt;br/&gt;
insert?  bulk insert? upsert? etc...?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 21 Dec 2012 05:56:32 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        11 years, 8 weeks, 3 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            11 years, 8 weeks, 3 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10020"><![CDATA[Linux]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>krisant007</customfieldvalue>
            <customfieldvalue>stephen.steneker@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hricp3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrl4r3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>37109</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Set up a 6 node shard and start the router without explicitly specifying the values for chunkSize and oplogSize parameters.&lt;br/&gt;
Load the data into the cluster using YCSB clients.&lt;br/&gt;
Check the distribution of data once the data loading is done.&lt;br/&gt;
For this log in to any of the routers and navigate to the database and then issue the following command.&lt;br/&gt;
db.&amp;lt;collection&amp;gt;.getShardDistribution()&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht05g7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>