<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:39:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-35431] rollback does not correct sizeStorer data sizes</title>
                <link>https://jira.mongodb.org/browse/SERVER-35431</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We just keep the data size the same when we recover to a stable timestamp instead of correcting it like we do with counts: &lt;a href=&quot;https://github.com/mongodb/mongo/blob/f757bc52b926943bc748f0dc33173ab16e980f61/src/mongo/db/repl/storage_interface_impl.cpp#L1025-L1028&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/blob/f757bc52b926943bc748f0dc33173ab16e980f61/src/mongo/db/repl/storage_interface_impl.cpp#L1025-L1028&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This means that the size reported in collStats will be wrong, it also can have the side effect of slowly decreasing the effective size of a capped collection, since the system will think it&apos;s more full than it actually is. Validate will fix the size.&lt;/p&gt;</description>
                <environment></environment>
        <key id="554780">SERVER-35431</key>
            <summary>rollback does not correct sizeStorer data sizes</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-execution">Backlog - Storage Execution Team</assignee>
                                    <reporter username="judah.schvimer@mongodb.com">Judah Schvimer</reporter>
                        <labels>
                            <label>pm-1820</label>
                    </labels>
                <created>Tue, 5 Jun 2018 23:17:24 +0000</created>
                <updated>Tue, 6 Dec 2022 03:27:20 +0000</updated>
                                                                            <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>13</watches>
                                                                                                                <comments>
                            <comment id="2785627" author="geert.bosch" created="Wed, 5 Feb 2020 20:14:44 +0000"  >&lt;p&gt;This ticket really is two issues:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Fix &lt;tt&gt;dataSize&lt;/tt&gt; to be correct in the presence of crashes/rollbacks. That&apos;s work that falls on the Storage Execution team. It&apos;s a significant chunk of work, but something I think we&apos;ll need to do.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;Decide whether &lt;tt&gt;storageSize&lt;/tt&gt; or &lt;tt&gt;dataSize&lt;/tt&gt; is the better metric to use for decisions on where to place data, etc. I think that &lt;tt&gt;dataSize&lt;/tt&gt; is generally better as &lt;tt&gt;storageSize&lt;/tt&gt; can differ significantly between nodes based on their history.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;A newly added node may have significantly less fragmentation and better compression than a long-lived node that has processed lots of remove and update operations. Deciding chunk migration based on &lt;tt&gt;storageSize&lt;/tt&gt; could lead to unstable behavior where chunks move back and forth depending on which node of a replicaset is used to find the &lt;tt&gt;storageSize&lt;/tt&gt; of a collection. Additionally &lt;tt&gt;dataSize&lt;/tt&gt; is important as it determines memory pressure for data access. If we&apos;d balance to shards to both have a &lt;tt&gt;storageSize&lt;/tt&gt; of 100 GB, but one uncompresses to 300 GB and the other to 600 GB it is likely that the latter node will perform much worse as it can cache a much smaller fraction of its data. The expectation is that over time storage sizes will balance out.&lt;/p&gt;</comment>
                            <comment id="1989069" author="kaloian.manassiev" created="Wed, 29 Aug 2018 17:31:13 +0000"  >&lt;p&gt;The &lt;tt&gt;enableSharding&lt;/tt&gt; command uses the &lt;a href=&quot;https://github.com/mongodb/mongo/blob/5699eaafce230c4d6975bbe8a670a91f0487ebd4/src/mongo/s/shard_util.cpp#L82&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;totalSize&lt;/tt&gt;&lt;/a&gt; field from &lt;tt&gt;listDatabases&lt;/tt&gt;. Looking at &lt;tt&gt;listDatabases&lt;/tt&gt;, this value is derived from &lt;a href=&quot;https://github.com/mongodb/mongo/blob/5699eaafce230c4d6975bbe8a670a91f0487ebd4/src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp#L147&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;DatabaseCatalogEntry::sizeOnDisk&lt;/tt&gt;&lt;/a&gt;, which eventually calls into &lt;a href=&quot;https://github.com/mongodb/mongo/blob/5699eaafce230c4d6975bbe8a670a91f0487ebd4/src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp#L147&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;RecordStore::storageSize&lt;/tt&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So I guess the answer to your question is that the primary shard selection uses &lt;tt&gt;storageSize&lt;/tt&gt; and not &lt;tt&gt;dataSize&lt;/tt&gt;.&lt;/p&gt;</comment>
                            <comment id="1988134" author="alyson.cabral" created="Tue, 28 Aug 2018 20:11:18 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=spencer&quot; class=&quot;user-hover&quot; rel=&quot;spencer&quot;&gt;spencer&lt;/a&gt; or &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kaloian.manassiev&quot; class=&quot;user-hover&quot; rel=&quot;kaloian.manassiev&quot;&gt;kaloian.manassiev&lt;/a&gt; do we know if we choose the primary&#160;shard for a database with dataSize or storageSize?&#160;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.mongodb.com/manual/core/sharded-cluster-shards/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://docs.mongodb.com/manual/core/sharded-cluster-shards/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I believe we spoke about this in person, but just so it&apos;s captured here, in addition to capped collections becoming the incorrect size, these numbers are also used in balancing.&#160;&lt;/p&gt;</comment>
                            <comment id="1919344" author="michael.cahill" created="Wed, 13 Jun 2018 12:20:41 +0000"  >&lt;p&gt;We can address this issue for capped collections without dramatic changes such as &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-35565&quot; title=&quot;Change capped collection age-out to be based on collection storageSize, not dataSize&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-35565&quot;&gt;&lt;del&gt;SERVER-35565&lt;/del&gt;&lt;/a&gt;.  In particular, for capped collections we can correct the data size during rollback, since the only permitted operations are inserts so we have the size of the inserted documents that are rolled back.&lt;/p&gt;

&lt;p&gt;For general collections, we could reduce the drift by (a) accounting for inserts that are rolled back and (b) estimating the effect of deletes on the data size (e.g., by estimating that all deleted documents are the average document size).  We don&apos;t have enough information (either in the oplog or efficiently available in WiredTiger) to deal with all size-changing updates, but we should be able to avoid systematic drift.&lt;/p&gt;</comment>
                            <comment id="1914241" author="greg.mckeon" created="Thu, 7 Jun 2018 19:39:13 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=spencer&quot; class=&quot;user-hover&quot; rel=&quot;spencer&quot;&gt;spencer&lt;/a&gt; to follow up with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt; to see if there&apos;s a possible fix for this in the storage layer.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="93555">SERVER-11113</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="425996">SERVER-31020</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="543453">SERVER-34977</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="558205">DOCS-11792</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="558201">SERVER-35565</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25136"><![CDATA[Storage Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 7 Jun 2018 19:39:13 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 1 week ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-2944</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 1 week ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>18.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>alyson.cabral@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-server-execution</customfieldvalue>
            <customfieldvalue>geert.bosch@mongodb.com</customfieldvalue>
            <customfieldvalue>greg.mckeon@mongodb.com</customfieldvalue>
            <customfieldvalue>judah.schvimer@mongodb.com</customfieldvalue>
            <customfieldvalue>kaloian.manassiev@mongodb.com</customfieldvalue>
            <customfieldvalue>michael.cahill@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htzv53:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr24kn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htzhef:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>