<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:04:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-44088] Autosplitter seems to ignore some fat chunks </title>
                <link>https://jira.mongodb.org/browse/SERVER-44088</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We have a somewhat large and hot MongoDB cluster. Size of each shard is ~400GiB. Autosplitter is configured to split chunks when they exceed 64 MB (default value). However, I see a lot of much larger chunks. Some chunks are as large as 400 MB. The chunks are in a hot collection that is constantly being inserted and updated. Distribution of hot and cold documents across chunks is even (we use hashed sharding on &lt;tt&gt;_id&lt;/tt&gt; with auto generated &lt;tt&gt;ObjectId&lt;/tt&gt;).&lt;/p&gt;

&lt;p&gt;As far as I can see, autosplitter is throttled by getting a token from the token pool with 5 tokens in it. I suspect that in a large shard with unevenly hot collections autosplitter activity is unfairly distributed in favor of hotter collections with a possibility of never getting autosplitter attention for colder collections.&lt;/p&gt;

&lt;p&gt;We&apos;ve hit hard into this issue when added new shards to our cluster and balancer started moving chunks. When balancer spots chunk that is too big, it splits the chunk and balances smaller chunk to new shards. As a result, with evenly distributed chunks across the cluster we have 3x documents in older shards and 1x documents in newer shards.&lt;/p&gt;</description>
                <environment></environment>
        <key id="972396">SERVER-44088</key>
            <summary>Autosplitter seems to ignore some fat chunks </summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="eric.sedor@mongodb.com">Eric Sedor</assignee>
                                    <reporter username="sz">Sergey Zagursky</reporter>
                        <labels>
                    </labels>
                <created>Thu, 17 Oct 2019 20:45:20 +0000</created>
                <updated>Wed, 20 Nov 2019 15:48:27 +0000</updated>
                            <resolved>Wed, 20 Nov 2019 15:48:27 +0000</resolved>
                                    <version>4.0.10</version>
                                                                        <votes>1</votes>
                                    <watches>9</watches>
                                                                                                                <comments>
                            <comment id="2553812" author="eric.sedor" created="Wed, 20 Nov 2019 15:47:55 +0000"  >&lt;p&gt;That makes sense, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sz&quot; class=&quot;user-hover&quot; rel=&quot;sz&quot;&gt;sz&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;There are a number of known reasons why splits may not occur. We&apos;re going to close this ticket as unfortunately we need would need details to perform an investigation about this specific case. But, we can re-open the ticket if you see a large chunk and are able to provide specifics about it.&lt;/p&gt;

&lt;p&gt;Alternatively, please let us know if the need for your tool goes goes away in version 4.2. We would be particularly interested in information if the issue persists after the changes to chunk splits in 4.2.&lt;/p&gt;

&lt;p&gt;Gratefully,&lt;br/&gt;
Eric&lt;/p&gt;</comment>
                            <comment id="2551325" author="sz" created="Wed, 20 Nov 2019 07:35:44 +0000"  >&lt;p&gt;@Kelly Lewis, unfortunately I can&apos;t provide the information requested now. We had some kind of urge to resolve the situation and I&apos;ve splitted the chunks using the tool we&apos;ve written. The tool uses &lt;tt&gt;splitVector&lt;/tt&gt; to choose split points. Therefore I&apos;d not include&#160;&lt;tt&gt;splitVector&lt;/tt&gt; as a suspect for my case.&lt;/p&gt;</comment>
                            <comment id="2550254" author="kelly.lewis" created="Tue, 19 Nov 2019 20:57:27 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sz&quot; class=&quot;user-hover&quot; rel=&quot;sz&quot;&gt;sz&lt;/a&gt;, can you please provide the information Eric requested for the specific chunk?&lt;/p&gt;</comment>
                            <comment id="2518380" author="eric.sedor" created="Tue, 5 Nov 2019 17:42:59 +0000"  >&lt;p&gt;Understood &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sz&quot; class=&quot;user-hover&quot; rel=&quot;sz&quot;&gt;sz&lt;/a&gt;. To investigate a specific bug we think it makes sense to focus on whether or not splitVector is accurately selecting a good split point for a chunk being split.&lt;/p&gt;

&lt;p&gt;For a specific chunk can you provide:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The entry from config.chunks for that chunk prior to a split&lt;/li&gt;
	&lt;li&gt;The entry from config.chunks for the resulting split chunks&lt;/li&gt;
	&lt;li&gt;The results of a count operation using $min and $max to target the range of documents in the resulting split chunks (&lt;a href=&quot;https://docs.mongodb.com/manual/reference/operator/meta/max/index.html#use-with-min&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://docs.mongodb.com/manual/reference/operator/meta/max/index.html#use-with-min&lt;/a&gt;). Note that $min and $max allow you to query on the hashed ranges provided in sh.status() or in the chunk entry.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="2507844" author="sz" created="Wed, 30 Oct 2019 06:20:34 +0000"  >&lt;p&gt;Eric Sedor, I have identified many such individual chunks. While we do have unexpectedly large chunks in some collections due to bad sharding key choice, there are many chunks that grown beyond any reasonable size because of autosplitter being too passive still.&lt;/p&gt;

&lt;p&gt;I made a tool that took all the chunks and splitted those that were too large. It takes a chunk and applies a `splitVector` command to it. Then, if `splitVector` resulted in one or more split points, it used `split` command to split a chunk. As far as I can see, this is exactly how autosplitter works. The tool have helped the balancer to equalize document count across our shards.&lt;/p&gt;</comment>
                            <comment id="2507665" author="eric.sedor" created="Tue, 29 Oct 2019 23:07:33 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sz&quot; class=&quot;user-hover&quot; rel=&quot;sz&quot;&gt;sz&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;As &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dmitry.agranat&quot; class=&quot;user-hover&quot; rel=&quot;dmitry.agranat&quot;&gt;dmitry.agranat&lt;/a&gt; mentions, there are many improvements not only in 4.2 but planned in the future which will help with balance in sharded clusters. To investigate a specific issue as a bug we would want to understand in detail what has happened to a specific chunk. Are you able to identify a specific chunk that has been split in an un-equal way, or which has grown large while never being considered for a split?&lt;/p&gt;

&lt;p&gt;Gratefully,&lt;br/&gt;
Eric&lt;/p&gt;</comment>
                            <comment id="2500485" author="petr.ivanov.s@gmail.com" created="Fri, 25 Oct 2019 07:14:07 +0000"  >&lt;p&gt;We&apos;re using a configuration where a mongos is set up side-by-side with any service which talks to mongo cluster. So, depending on the instance size we use at a given moment, it varies, but you can safely assume that we&apos;re talking dozens and more here.&#160;&lt;/p&gt;</comment>
                            <comment id="2500049" author="eric.sedor" created="Thu, 24 Oct 2019 22:09:50 +0000"  >&lt;p&gt;Yea &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt;, I mean mongos instances. Sorry for being unclear!&lt;/p&gt;</comment>
                            <comment id="2498605" author="petr.ivanov.s@gmail.com" created="Thu, 24 Oct 2019 06:46:37 +0000"  >&lt;p&gt;Hi, Eric.&#160;&lt;/p&gt;

&lt;p&gt;By routers do you mean mongos instances?&lt;/p&gt;</comment>
                            <comment id="2497974" author="eric.sedor" created="Wed, 23 Oct 2019 18:25:50 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=petr.ivanov.s%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;petr.ivanov.s@gmail.com&quot;&gt;petr.ivanov.s@gmail.com&lt;/a&gt;, I wanted to add a question: Can you please let us know how many routers you run in this cluster?&lt;/p&gt;</comment>
                            <comment id="2496969" author="petr.ivanov.s@gmail.com" created="Wed, 23 Oct 2019 05:44:35 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dmitry.agranat&quot; class=&quot;user-hover&quot; rel=&quot;dmitry.agranat&quot;&gt;dmitry.agranat&lt;/a&gt;, does this imply that auto-splitter will eventually process every oversized chunk, even if the thread pool in question was overloaded at the moment of last write to given oversized chunk?&#160;&lt;/p&gt;</comment>
                            <comment id="2494336" author="dmitry.agranat" created="Tue, 22 Oct 2019 13:02:03 +0000"  >&lt;p&gt;Hi Sergey,&lt;/p&gt;

&lt;p&gt;In 4.2 we moved the auto-splitter to run on the shard primary (&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9287&quot; title=&quot;Decision to split chunk should happen on shard mongod, not on mongos&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9287&quot;&gt;&lt;del&gt;SERVER-9287&lt;/del&gt;&lt;/a&gt;) meaning that instead of a fixed number of tickets it uses a ThreadPool with 20 threads to schedule the split work, which means it will at least be fair between collections and won&apos;t completely neglect the colder ones.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dima&lt;/p&gt;</comment>
                            <comment id="2493325" author="eric.sedor" created="Mon, 21 Oct 2019 19:36:55 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sz&quot; class=&quot;user-hover&quot; rel=&quot;sz&quot;&gt;sz&lt;/a&gt;, thanks for this submission. We are looking into it and will likely follow up with some questions. Thanks in advance for your patience.&lt;/p&gt;

&lt;p&gt;Eric&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="134394">SERVER-13806</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="71153">SERVER-9287</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="80274">SERVER-10024</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 21 Oct 2019 19:36:55 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 12 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>eric.sedor@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 12 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>dmitry.agranat@mongodb.com</customfieldvalue>
            <customfieldvalue>eric.sedor@mongodb.com</customfieldvalue>
            <customfieldvalue>kelly.lewis@mongodb.com</customfieldvalue>
            <customfieldvalue>petr.ivanov.s@gmail.com</customfieldvalue>
            <customfieldvalue>sz</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvxynr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hvmf7j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[eric.sedor@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvxkx3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>