<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:42:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-57767] dataSize command returns wrong number of documents when there are orphaned documents</title>
                <link>https://jira.mongodb.org/browse/SERVER-57767</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Currently, the dataSize command on mongos &lt;a href=&quot;https://github.com/mongodb/mongo/blob/eae31861e0f813f0099e1d490c4a622d75cd5a08/src/mongo/s/commands/cluster_data_size_cmd.cpp#L88&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;does not target shards based on the range specified in the command&lt;/a&gt;. In addition, each shard &lt;a href=&quot;https://github.com/mongodb/mongo/blob/368d6ce6986680211fd4c09bb23431d08bb4e297/src/mongo/db/commands/dbcommands.cpp#L368-L376&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;uses the range in the command&lt;/a&gt;&#160;to do the counting. So if there are orphaned documents, the command will return the wrong number of documents.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1787155">SERVER-57767</key>
            <summary>dataSize command returns wrong number of documents when there are orphaned documents</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="garaudy.etienne@mongodb.com">Garaudy Etienne</assignee>
                                    <reporter username="cheahuychou.mao@mongodb.com">Cheahuychou Mao</reporter>
                        <labels>
                            <label>query-director-triage</label>
                            <label>sharding-product-sync</label>
                    </labels>
                <created>Wed, 16 Jun 2021 20:53:09 +0000</created>
                <updated>Fri, 14 Jan 2022 05:02:26 +0000</updated>
                            <resolved>Fri, 14 Jan 2022 05:02:26 +0000</resolved>
                                                                                        <votes>1</votes>
                                    <watches>10</watches>
                                                                                                                <comments>
                            <comment id="4075999" author="max.hirschhorn@10gen.com" created="Tue, 21 Sep 2021 22:40:33 +0000"  >&lt;p&gt;Thanks for the thoughtful questions &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kyle.suarez&quot; class=&quot;user-hover&quot; rel=&quot;kyle.suarez&quot;&gt;kyle.suarez&lt;/a&gt;. I&apos;ve flagged this ticket for the Sharding product sync meeting so we can discuss/research more about the use cases for the dataSize command.&lt;/p&gt;

&lt;p&gt;To add my own thoughts here:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;If the dataSize supported &lt;tt&gt;readConcern&lt;/tt&gt; options, then there&apos;d be a natural way of using {level: &quot;available&quot;} to indicate whether to include unowned documents on the shards.&lt;/li&gt;
	&lt;li&gt;The storageStats returned by $collStats stage is likely more helpful for users who want to understand the physical size on disk because their configuration (by default) has compression from WiredTiger enabled.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="4075813" author="kyle.suarez" created="Tue, 21 Sep 2021 21:30:03 +0000"  >&lt;p&gt;After a discussion with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=christopher.harris&quot; class=&quot;user-hover&quot; rel=&quot;christopher.harris&quot;&gt;christopher.harris&lt;/a&gt;, while we think that changing &lt;tt&gt;dataSize&lt;/tt&gt; to exclude orphans makes sense, we also want to point out that there might also be a use case for including orphans: specifically, if an administrator were interested in understanding the true physical size of a collection on disk, orphans and all.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=cheahuychou.mao&quot; class=&quot;user-hover&quot; rel=&quot;cheahuychou.mao&quot;&gt;cheahuychou.mao&lt;/a&gt;, what was the original use case that led to this ticket? Do you think it would make sense from a user perspective to, say, change the default behavior of &lt;tt&gt;dataSize&lt;/tt&gt; to ignore orphans but also introduce a new option flag that will include orphans if specified?&lt;/p&gt;</comment>
                            <comment id="4074799" author="chou.mao" created="Tue, 21 Sep 2021 16:30:55 +0000"  >&lt;p&gt;Confirmed with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt;&#160;that we don&apos;t use the dataSize command internally on sharding. We use &lt;a href=&quot;https://github.com/10gen/mongo/blob/95f96cff44a07b682281caf2632b3faf3461b042/src/mongo/db/catalog/collection_impl.cpp#L1445-L1447&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Collection::dataSize()&lt;/a&gt;&#160;in the code for&#160;&lt;a href=&quot;https://github.com/10gen/mongo/blob/8974dbdec0286ac47086b794c49214a9f26677bc/src/mongo/db/s/split_vector.cpp#L110&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;splitVector&lt;/a&gt;&#160;and &lt;a href=&quot;https://github.com/10gen/mongo/blob/03de046174e7f3ced4fc099ccc4e1a568c414654/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L887&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;chunk migration cloning&lt;/a&gt;. So I think we should make the dataSize command do shard filtering.&#160;&lt;/p&gt;</comment>
                            <comment id="4074487" author="kyle.suarez" created="Tue, 21 Sep 2021 15:09:58 +0000"  >&lt;p&gt;If we want the size of owned documents, then yes, it sounds like we should add a SHARDING_FILTER. But if an administrator wants to know the actual true size of data on disk, then the SHARDING_FILTER is potentially omitting relevant documents.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=cheahuychou.mao&quot; class=&quot;user-hover&quot; rel=&quot;cheahuychou.mao&quot;&gt;cheahuychou.mao&lt;/a&gt;, do you know if we use the &lt;tt&gt;dataSize&lt;/tt&gt; command internally? For example, is it used by sharding to determine if we need to make a chunk migration?&lt;/p&gt;

&lt;p&gt;Sending to &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sebastien.mendez&quot; class=&quot;user-hover&quot; rel=&quot;sebastien.mendez&quot;&gt;sebastien.mendez&lt;/a&gt;&apos;s team for investigation.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000wWdgOQAS]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 21 Sep 2021 15:09:58 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 20 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>garaudy.etienne@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 20 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>cheahuychou.mao@mongodb.com</customfieldvalue>
            <customfieldvalue>garaudy.etienne@mongodb.com</customfieldvalue>
            <customfieldvalue>kyle.suarez@mongodb.com</customfieldvalue>
            <customfieldvalue>max.hirschhorn@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzmvf3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hz7e67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzmho7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>