<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:26:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-51721] dataSize do not reduce after chunks migrated</title>
                <link>https://jira.mongodb.org/browse/SERVER-51721</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Recently I have noticed a issue after upgrading from 4.0.14 to 4.0.20&lt;/p&gt;

&lt;p&gt;When using Mongo version 4.0.14, I have added 1 shard on top of existing 2 shards. turnedOn the balancer and chunks migrated to newly added shard (3rd) and I could see dataSize reduced on older shards, by using &quot;compact&quot; I reclaimed storageSize after migration.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;After upgrading to 4.0.20, I have added 1 more shard on top of 3 shards,&#160;turnedOn the balancer and chunks migrated to newly added shard (4th) and I did not notice dataSize reduced on older shards, I also verified by looking at&#160;&quot;file bytes available for reuse&quot; on db.stats() and db.coll.stats()&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1518043">SERVER-51721</key>
            <summary>dataSize do not reduce after chunks migrated</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="edwin.zhou@mongodb.com">Edwin Zhou</assignee>
                                    <reporter username="dheeraj.dba7@gmail.com">Dheeraj G</reporter>
                        <labels>
                    </labels>
                <created>Sun, 18 Oct 2020 16:54:56 +0000</created>
                <updated>Thu, 5 Nov 2020 15:31:27 +0000</updated>
                            <resolved>Thu, 5 Nov 2020 15:31:27 +0000</resolved>
                                                                    <component>Storage</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="3476222" author="JIRAUSER1257066" created="Wed, 4 Nov 2020 18:51:31 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dheeraj.dba7%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;dheeraj.dba7@gmail.com&quot;&gt;dheeraj.dba7@gmail.com&lt;/a&gt; ,&lt;/p&gt;

&lt;p&gt;Thank you for providing updates to your issue! I hope your continued investigation in the community forums have helped you understand what to expect in chunk migrations. I&apos;m going to close this ticket as we&apos;ve redirected your issue to the &lt;a href=&quot;https://community.mongodb.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MongoDB Developer Community Forums&lt;/a&gt;, but if your investigation leads you to believe that you&apos;ve run into a bug, we can continue that discussion in the SERVER project.&lt;/p&gt;

&lt;p&gt;Best,&lt;/p&gt;

&lt;p&gt;Edwin&lt;/p&gt;</comment>
                            <comment id="3462275" author="JIRAUSER1257316" created="Sat, 24 Oct 2020 22:01:33 +0000"  >&lt;p&gt;Hi Edwin,&lt;/p&gt;

&lt;p&gt;Since you have looked at this issue, I am sharing my observations after I added 5th Shard, data has been balanced after adding new shard but, I do not observe any change in dataSize reduce on &quot;rs-qa-c_0&quot; where as &quot;rs-qa-c_2&quot; holds almost similar dataset in terms of documents count and chunks but, &quot;rs-qa-c_2&quot; is 1/4th size of &quot;rs-qa-c_0&quot; shard.&lt;/p&gt;



&lt;p&gt;---------------------------------------------------------------------------&lt;/p&gt;

&lt;p&gt;MongoDB Enterprise mongos&amp;gt; db.C_C.getShardDistribution()&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Shard &lt;font color=&quot;#ff8b00&quot;&gt;rs-qa-c_2&lt;/font&gt; at rs-qa-c_2/dc616512.domain:27017,dc616513.domain:27017&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;data : 308.39GiB docs : &lt;font color=&quot;#ff8b00&quot;&gt;10716176&lt;/font&gt; chunks : 52479&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;estimated data per chunk : 6.01MiB&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;estimated docs per chunk : 204&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Shard rs-qa-c_4 at rs-qa-c_4/dc1008178.domain:27017,dc1008211.domain:27017&lt;br/&gt;
 data : 495.43GiB docs : 4879467 chunks : 52026&lt;br/&gt;
 estimated data per chunk : 9.75MiB&lt;br/&gt;
 estimated docs per chunk : 93&lt;/p&gt;

&lt;p&gt;Shard rs-qa-c_3 at rs-qa-c_3/dc1008002.domain:27017,dc1008003.domain:27017&lt;br/&gt;
 data : 555.89GiB docs : 5315829 chunks : 52470&lt;br/&gt;
 estimated data per chunk : 10.84MiB&lt;br/&gt;
 estimated docs per chunk : 101&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Shard &lt;font color=&quot;#ff8b00&quot;&gt;rs-qa-c_0&lt;/font&gt; at rs-qa-c_0/dc615353.domain:27017,dc615354.domain:27017&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;data : 1206.54GiB docs : &lt;font color=&quot;#ff8b00&quot;&gt;11328011&lt;/font&gt; chunks : 52465&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;estimated data per chunk : 23.54MiB&lt;/b&gt;&lt;br/&gt;
 &lt;b&gt;estimated docs per chunk : 215&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Shard rs-qa-c_1 at rs-qa-c_1/dc615355.domain:27017,dc615356.domain:27017&lt;br/&gt;
 data : 316.28GiB docs : 3051536 chunks : 52456&lt;br/&gt;
 estimated data per chunk : 6.17MiB&lt;br/&gt;
 estimated docs per chunk : 58&lt;/p&gt;

&lt;p&gt;Totals&lt;br/&gt;
 data : 2882.54GiB docs : 35291019 chunks : 261896&lt;br/&gt;
 &lt;b&gt;Shard rs-qa-c_2 contains 10.69% data, &lt;font color=&quot;#ff8b00&quot;&gt;30.36% docs&lt;/font&gt; in cluster, avg obj size on shard : 30KiB&lt;/b&gt;&lt;br/&gt;
 Shard rs-qa-c_4 contains 17.18% data, 13.82% docs in cluster, avg obj size on shard : 106KiB&lt;br/&gt;
 Shard rs-qa-c_3 contains 19.28% data, 15.06% docs in cluster, avg obj size on shard : 109KiB&lt;br/&gt;
 &lt;b&gt;Shard rs-qa-c_0 contains 41.85% data, &lt;font color=&quot;#ff8b00&quot;&gt;32.09% docs&lt;/font&gt; in cluster, avg obj size on shard : 111KiB&lt;/b&gt;&lt;br/&gt;
 Shard rs-qa-c_1 contains 10.97% data, 8.64% docs in cluster, avg obj size on shard : 108KiB&lt;/p&gt;

&lt;p&gt;-----------------------------------------------------------------------------------------------------&lt;/p&gt;</comment>
                            <comment id="3460515" author="JIRAUSER1257316" created="Thu, 22 Oct 2020 22:43:44 +0000"  >&lt;p&gt;Hi Edwin,&lt;/p&gt;

&lt;p&gt;Sure, I am continuously investigating on it, also I am adding 2 more shards, will keep you posted after data is balanced. And I am not sure at this moment if it&apos;s really because of either of the versions (4.0.14, 4.0.20) behavior.&#160;&lt;/p&gt;

&lt;p&gt;Meanwhile as you mentioned I will also reach out on&#160;&lt;a href=&quot;https://community.mongodb.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MongoDB Developer Community Forums&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Dheeraj&lt;/p&gt;</comment>
                            <comment id="3460381" author="JIRAUSER1257066" created="Thu, 22 Oct 2020 21:55:51 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dheeraj.dba7%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;dheeraj.dba7@gmail.com&quot;&gt;dheeraj.dba7@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;One thing we can add at this point is that in 4.2 we moved the auto-splitter to run on the shard primary (&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9287&quot; title=&quot;Decision to split chunk should happen on shard mongod, not on mongos&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9287&quot;&gt;&lt;del&gt;SERVER-9287&lt;/del&gt;&lt;/a&gt;), which improves chunk splits. If you can upgrade to 4.2 you should see more predictable chunk split behavior as a result.&lt;/p&gt;

&lt;p&gt;The getShardDistribution results suggest that the additional data size on your c_0 shard is explainable by more documents and data being located there. Because of this, we aren&apos;t able to easily reason about whether a bug is involved here, and we aren&apos;t aware of any changes between 4.0.14 and 4.0.20 that would influence split or migration behavior.&lt;/p&gt;

&lt;p&gt;As such, we&apos;d like to suggest you investigate which chunks are larger, and why they may be larger. The best place to start if you are unsure will be to reach out to our community by posting on the &lt;a href=&quot;https://community.mongodb.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MongoDB Developer Community Forums&lt;/a&gt;. Should your investigation lead you to suspect a more specific bug, we could investigate further here in the SERVER project.&lt;/p&gt;

&lt;p&gt;Best,&lt;/p&gt;

&lt;p&gt;Edwin&lt;/p&gt;</comment>
                            <comment id="3458198" author="JIRAUSER1257316" created="Thu, 22 Oct 2020 03:40:43 +0000"  >&lt;p&gt;Hi Edwin,&lt;/p&gt;

&lt;p&gt;I haven&apos;t tried compact, but after reporting I tried initial sync followed by secondary and primary on &quot;rs-qa-C_0&quot; shard.&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Performed&#160;&lt;a href=&quot;https://docs.mongodb.com/manual/reference/command/cleanupOrphaned/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;cleanupOrphaned&lt;/a&gt;. (attached script for reference)&lt;/li&gt;
	&lt;li&gt;Provided sh.status() and db.collectionName.getShardDistribution() (attached results)&lt;/li&gt;
	&lt;li&gt;FYI, I downgraded to 4.0.14 yesterday to perform tests, so you&apos;ll see 4.0.14 in sh.status(). Also, I thought this is an issue after noticing dataSize on&#160;&quot;rs-qa-C_0&quot; shard&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Dheeraj&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="3457385" author="JIRAUSER1257066" created="Wed, 21 Oct 2020 20:21:52 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dheeraj.dba7%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;dheeraj.dba7@gmail.com&quot;&gt;dheeraj.dba7@gmail.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thanks for providing us this information. Did you run &lt;tt&gt;compact&lt;/tt&gt; the second time you added a shard?&lt;/p&gt;

&lt;p&gt;Keep in mind that it&apos;s not always expected that space on disk is affected by movements of chunks between shards. An unchanged space on disk could be a result of a number of things.&lt;/p&gt;

&lt;p&gt;The &lt;tt&gt;compact&lt;/tt&gt; operation depends on your workload and its effectiveness may vary. You may not see any reduction to space on disk as a result of running this operation.&lt;/p&gt;

&lt;p&gt;Another possibility may be when when the chunks migrate over to the new shard, the chunks in the origin shard will be removed asynchronously. It&apos;s possible that when you checked the size of the shard, that process may not have completed.&lt;/p&gt;

&lt;p&gt;It&apos;s also possible that the migration may have moved empty chunks which would not have any impact on &lt;tt&gt;dataSize&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;We would need to significantly narrow down what has occurred to determine if this is a bug or not.&lt;br/&gt;
 To help us, can you:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;First, rule out the known impact of orphaned documents using &lt;a href=&quot;https://docs.mongodb.com/manual/reference/command/cleanupOrphaned/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;cleanupOrphaned&lt;/a&gt;. If you need assistance running this command, I encourage you to ask our community by posting on the &lt;a href=&quot;https://community.mongodb.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MongoDB Developer Community Forums&lt;/a&gt;.&lt;/li&gt;
	&lt;li&gt;Then provide:
	&lt;ul&gt;
		&lt;li&gt;the output of &lt;a href=&quot;https://docs.mongodb.com/manual/reference/method/sh.status/index.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;sh.status()&lt;/a&gt;? We&apos;d like to understand how each collection is sharded and the current chunk counts per shard&lt;/li&gt;
		&lt;li&gt;the output of &lt;a href=&quot;https://docs.mongodb.com/manual/reference/method/db.collection.getShardDistribution/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;db.&amp;lt;collectionName&amp;gt;.getShardDistribution()&lt;/a&gt;. to help us understand data distribution at a high level.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Best,&lt;/p&gt;

&lt;p&gt;Edwin&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="283960" name="C_C-shardDistribution-10-21-2020.txt" size="1293" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:26 +0000"/>
                            <attachment id="283962" name="C_ERROR-shardDistribution-10-21-2020.txt" size="1297" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:26 +0000"/>
                            <attachment id="283961" name="C_E_C-shardDistribution-10-21-2020.txt" size="1315" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:26 +0000"/>
                            <attachment id="283963" name="C_RETRY-shardDistribution-10-21-2020.txt" size="1267" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:27 +0000"/>
                            <attachment id="283964" name="script-cleanupOrphaned.txt" size="333" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:27 +0000"/>
                            <attachment id="284341" name="sh-status()_10-24_1700CT.txt" size="4230" author="dheeraj.dba7@gmail.com" created="Sat, 24 Oct 2020 22:10:04 +0000"/>
                            <attachment id="283965" name="sh.status()-10-21-2020" size="4171" author="dheeraj.dba7@gmail.com" created="Thu, 22 Oct 2020 03:31:27 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 20 Oct 2020 19:36:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        3 years, 14 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>edwin.zhou@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            3 years, 14 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>dheeraj.dba7@gmail.com</customfieldvalue>
            <customfieldvalue>edwin.zhou@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hydatr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hxzjzb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[edwin.zhou@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hycx33:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>