<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:37:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-34819] Optimize the sharding balancer&apos;s cluster statistics gathering</title>
                <link>https://jira.mongodb.org/browse/SERVER-34819</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The sharding balancer currently issues &lt;tt&gt;listDatabases&lt;/tt&gt; against every single shard in order to access the &lt;tt&gt;totalSize&lt;/tt&gt; value. This value is used for ensuring that a shard&apos;s storage &lt;a href=&quot;https://docs.mongodb.com/manual/tutorial/manage-sharded-cluster-balancer/#change-the-maximum-storage-size-for-a-given-shard&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;maxSize&lt;/tt&gt;&lt;/a&gt; is not exceeded for users which have that value set.&lt;/p&gt;

&lt;p&gt;The &lt;tt&gt;listDatabases&lt;/tt&gt; call is quite heavy, especially for nodes with large number of databases/collections since it will &lt;tt&gt;fstat&lt;/tt&gt; every single file under the instance.&lt;/p&gt;

&lt;p&gt;There are a number of optimizations we can make in order to make this statistics gathering less expensive in the presence of &lt;tt&gt;maxSize&lt;/tt&gt; (listed in order of preference):&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Add a parameter to &lt;tt&gt;listDatabases&lt;/tt&gt; to allow it to return cached data size instead of every time {{fstat}}ing all the files&lt;/li&gt;
	&lt;li&gt;Issue the &lt;tt&gt;listDatabases&lt;/tt&gt; call in parallel against all shards so different shards&apos; execution overlaps&lt;/li&gt;
	&lt;li&gt;Cache the per-shard statistics so that they are not collected on every single round/moveChunk invocation&lt;/li&gt;
	&lt;li&gt;Collect the per-shard statistics asynchronously so that multiple concurrent moveChunk requests can benefit&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="538472">SERVER-34819</key>
            <summary>Optimize the sharding balancer&apos;s cluster statistics gathering</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="1" iconUrl="https://jira.mongodb.org/images/icons/statuses/open.png" description="">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="tommaso.tocci@mongodb.com">Tommaso Tocci</assignee>
                                    <reporter username="kaloian.manassiev@mongodb.com">Kaloian Manassiev</reporter>
                        <labels>
                            <label>QWB</label>
                    </labels>
                <created>Thu, 3 May 2018 11:34:05 +0000</created>
                <updated>Tue, 6 Feb 2024 13:59:51 +0000</updated>
                                                                            <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>14</watches>
                                                                                                                <comments>
                            <comment id="1881597" author="kaloian.manassiev" created="Thu, 3 May 2018 14:18:28 +0000"  >&lt;p&gt;I can think of multiple ways to do that, but I am not sure which one is realistic and/or independent of the storage engine.&lt;/p&gt;

&lt;p&gt;For example - can the sizes be lazily updated as WT writes to its files and only for the files that it writes to/garbage collects? Not sure if this is possible in real-time, but perhaps  it can be combined with an asynchronous thread which runs periodically and only {{fstat}}s the files which were written to.&lt;/p&gt;

&lt;p&gt;Alternatively, can we use the same mechanism which is used for the fast count, combined with some multiplier of the average data size? This is what we use for chunk size estimation.&lt;/p&gt;</comment>
                            <comment id="1881512" author="milkie" created="Thu, 3 May 2018 12:59:02 +0000"  >&lt;p&gt;I&apos;m not sure how a cache would work.  The calculation today locks each database and then iterates through every collection, adding its size plus all the collection&apos;s index sizes together.  The expense here probably comes from iterating through all collections and indexes on the entire system.  If this value were cached, when would it get updated?&lt;/p&gt;</comment>
                            <comment id="1881444" author="kaloian.manassiev" created="Thu, 3 May 2018 11:41:42 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt;, the sharding balancer uses the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/148d4f9ff706028bb30c5fd3624c8a874cd8eca4/src/mongo/s/shard_util.cpp#L82&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;totalSize&lt;/tt&gt;&lt;/a&gt; value from &lt;tt&gt;listDatabases&lt;/tt&gt; in order to estimate the data usage on a shard and running this command in the presence of large number of collections/indexes is very heavy. What is the possibility of the storage team exposing a &lt;tt&gt;cachedTotalSize&lt;/tt&gt; value, which doesn&apos;t &lt;tt&gt;fstat&lt;/tt&gt; every single file on the instance?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10320">
                    <name>Documented</name>
                                                                <inwardlinks description="is documented by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="401707">SERVER-30060</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25133"><![CDATA[Sharding EMEA]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[500A000000aQcAuIAK, 5002K00000e6xGlQAI, 5002K000011DDBiQAO, 5006R00001nqmlfQAA]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 3 May 2018 12:59:02 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 40 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>connie.chen@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 40 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>tommaso.tocci@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htx52n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hto6iv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="7720">Sharding EMEA 2023-10-16</customfieldvalue>
    <customfieldvalue id="7721">Sharding EMEA 2023-10-30</customfieldvalue>
    <customfieldvalue id="7888">CAR Team 2023-11-13</customfieldvalue>
    <customfieldvalue id="7889">CAR Team 2023-11-27</customfieldvalue>
    <customfieldvalue id="7890">CAR Team 2023-12-11</customfieldvalue>
    <customfieldvalue id="7891">CAR Team 2023-12-25</customfieldvalue>
    <customfieldvalue id="7892">CAR Team 2024-01-08</customfieldvalue>
    <customfieldvalue id="8006">CAR Team 2024-01-22</customfieldvalue>
    <customfieldvalue id="8007">CAR Team 2024-02-05</customfieldvalue>
    <customfieldvalue id="8008">CAR Team 2024-02-19</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htwrbr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>