<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:53:17 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-20140] shard balancer fails to split chunks with more than 250000 docs</title>
                <link>https://jira.mongodb.org/browse/SERVER-20140</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;is it possible there was a regression re-surfaced any of these issues? &lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9365&quot; title=&quot;mongod always split at  250000 position&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9365&quot;&gt;&lt;del&gt;SERVER-9365&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9498&quot; title=&quot;Possible bug in SplitVector: Mongodb keeps on splitting the same chunk over and over again for hours&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9498&quot;&gt;&lt;del&gt;SERVER-9498&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9690&quot; title=&quot;SplitVector fails to find the mid-point of a chunk&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9690&quot;&gt;&lt;del&gt;SERVER-9690&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9792&quot; title=&quot;Wrong maxChunkSize on SplitVector w/ force&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9792&quot;&gt;&lt;del&gt;SERVER-9792&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-10271&quot; title=&quot;jstests/sharding/count1.js failing on multiple platforms&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-10271&quot;&gt;&lt;del&gt;SERVER-10271&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;i&apos;m seeing very very similar behavior:&lt;/p&gt;

&lt;p&gt;mongodb 3.0.4 sharded collections with 64MB chunks using wiredtiger&lt;/p&gt;

&lt;p&gt;one colleciton with documents that average 2kB in size&lt;br/&gt;
one collection with documents that average 40B bytes in size&lt;/p&gt;

&lt;p&gt;the collection with 2kB size docs is even distributed&lt;br/&gt;
the collection with 40B size docs is nearly entirely jumbo chunks&lt;/p&gt;

&lt;p&gt;running the balancer does not seem to automatically split chunks - just marks them as jumbo.&lt;/p&gt;

&lt;p&gt;i can run pass after pass of sh.splitFind on each chunk until there are no jumbo chunks left and then more things get balanced. &lt;/p&gt;

&lt;p&gt;except then when i run the balancer again i get more chunks marked as jumbo and then i need to do splits again.&lt;/p&gt;

&lt;p&gt;basically to get the cluster evenly distributed after an initial load i have to alternate splitting and balancing for days.&lt;/p&gt;</description>
                <environment></environment>
        <key id="227171">SERVER-20140</key>
            <summary>shard balancer fails to split chunks with more than 250000 docs</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="underrun">Derek Wilson</reporter>
                        <labels>
                    </labels>
                <created>Wed, 26 Aug 2015 15:48:46 +0000</created>
                <updated>Fri, 28 Aug 2015 19:26:05 +0000</updated>
                            <resolved>Wed, 26 Aug 2015 21:30:57 +0000</resolved>
                                    <version>3.0.4</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="1017326" author="underrun" created="Fri, 28 Aug 2015 19:26:05 +0000"  >&lt;p&gt;Thanks... but that work around doesn&apos;t work for my use case&lt;/p&gt;</comment>
                            <comment id="1017267" author="ramon.fernandez" created="Fri, 28 Aug 2015 18:35:42 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=underrun&quot; class=&quot;user-hover&quot; rel=&quot;underrun&quot;&gt;underrun&lt;/a&gt;, this is to let you know that we&apos;ve posted a workaround for this issue in the &quot;Description&quot; section of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-19919&quot; title=&quot;Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-19919&quot;&gt;&lt;del&gt;SERVER-19919&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1015451" author="ramon.fernandez" created="Wed, 26 Aug 2015 21:29:48 +0000"  >&lt;p&gt;Thanks for your report &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=underrun&quot; class=&quot;user-hover&quot; rel=&quot;underrun&quot;&gt;underrun&lt;/a&gt;. This bug was reported earlier in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-19919&quot; title=&quot;Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-19919&quot;&gt;&lt;del&gt;SERVER-19919&lt;/del&gt;&lt;/a&gt;, so I&apos;m going to mark this ticket as a duplicate. Please watch &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-19919&quot; title=&quot;Chunks that exceed 250000 docs but are under half chunk size get marked as jumbo&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-19919&quot;&gt;&lt;del&gt;SERVER-19919&lt;/del&gt;&lt;/a&gt; for updates, we&apos;re investigating a workaround until this issue is fixed.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1015147" author="underrun" created="Wed, 26 Aug 2015 17:13:59 +0000"  >&lt;p&gt;right - i forgot to mention that because i have collections with very large documents and collections with very small documents, making the chunk size small enough to avoid this issue will mean i have way too few documents per chunk with my larger document collections - for instance i would need to make chunk size about 9.5MB to keep a collection with average doc size of 40B under 250000 docs. But then chunks of collections with a 4k average doc size will only have about 2.4k docs per chunk. and with hundreds of millions of documents that means dozens (to hundreds) of thousands of chunks to manage which could start to cause problems on the other end of the spectrum.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="225412">SERVER-19919</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 26 Aug 2015 19:00:40 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        8 years, 24 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>underrun</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            8 years, 24 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>underrun</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrkwan:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hscquv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;1) create collection sharded on {_id: 1}  &lt;/p&gt;

&lt;p&gt;2) turn off balancer  &lt;/p&gt;

&lt;p&gt;2) insert ~100M small docs like {_id: &quot;text&quot;, &quot;c&quot;: 12345} (this is a colleciton of counts of strings if you care to know the real world use case)  &lt;/p&gt;

&lt;p&gt;3) turn on balancer and wait til things stop moving  &lt;/p&gt;

&lt;p&gt;4) turn balancer off  &lt;/p&gt;

&lt;p&gt;5) manually find all jumbo chunks and run sh.splitFind() on them  &lt;/p&gt;

&lt;p&gt;6) go back to 3 forever (or at least it feels like it)  &lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsfpr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>