<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:02:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-3294] Ability to keep data on disk in ~ index order</title>
                <link>https://jira.mongodb.org/browse/SERVER-3294</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;If all data points in a chunk were located sequentially on disk then migrations could be done using only sequential IO and not random IO. Additionally, it would not fragment the data files as it currently does when all chunks are intermingled.&lt;/p&gt;</description>
                <environment></environment>
        <key id="18463">SERVER-3294</key>
            <summary>Ability to keep data on disk in ~ index order</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="backlog-server-execution">Backlog - Storage Execution Team</assignee>
                                    <reporter username="bdarfler">Benjamin Darfler</reporter>
                        <labels>
                            <label>gm-ack</label>
                    </labels>
                <created>Sun, 19 Jun 2011 15:37:37 +0000</created>
                <updated>Tue, 6 Dec 2022 05:42:55 +0000</updated>
                            <resolved>Mon, 13 Sep 2021 18:24:39 +0000</resolved>
                                                                    <component>Index Maintenance</component>
                    <component>MMAPv1</component>
                    <component>Sharding</component>
                    <component>Storage</component>
                                        <votes>16</votes>
                                    <watches>22</watches>
                                                                                                                <comments>
                            <comment id="3653248" author="geert.bosch" created="Mon, 8 Mar 2021 22:18:57 +0000"  >&lt;p&gt;I think we should close this as Won&apos;t Do. With WiredTiger, and supposedly any other advance modern storage engine, the storage engine manages its own I/O. The nature of this I/O, as well as OS and filesystem abstractions, as well as the changing hardware characteristics including device-level block-remapping for defects or wear leveling of flash memory, make &quot;disk-order&quot; not well defined.&lt;/p&gt;

&lt;p&gt;Clustered indexes will ensure there is some locality when accessing documents in the same neighborhood according to the _id index order, whether the clusters are in disk order or not.&lt;/p&gt;</comment>
                            <comment id="3652937" author="louis.williams" created="Mon, 8 Mar 2021 19:58:24 +0000"  >&lt;p&gt;While this is &lt;em&gt;related&lt;/em&gt; to clustered indexes, this request is asking for collections to not just be logically in _id-order, but be physically stored on disk in _id-order. This is really a storage format request and I am putting this back in the backlog.&lt;/p&gt;</comment>
                            <comment id="1063701" author="mdcallag" created="Sun, 18 Oct 2015 13:14:09 +0000"  >&lt;p&gt;I don&apos;t see SSD replacing disk. Some people don&apos;t want to pay the cost difference, others need devices that can sustain higher write rates. For the same reason many deployments won&apos;t run on servers with 1T of RAM to cache their entire database. Just because they can doesn&apos;t mean they will pay to do so. &lt;/p&gt;

&lt;p&gt;However, there is efficiency to be gained with proper support for a clustered index. I hope that MongoDB can remove the hidden index used by RocksDB &amp;amp; WiredTiger. When running Linkbench on MongoDB the insert rate is much slower than on MySQL and part of the reason is extra indexes.&lt;br/&gt;
&lt;a href=&quot;http://smalldatum.blogspot.com/2015/07/linkbench-for-mysql-mongodb-with-cached.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://smalldatum.blogspot.com/2015/07/linkbench-for-mysql-mongodb-with-cached.html&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="1063668" author="markus.mahlberg@icloud.com" created="Sun, 18 Oct 2015 08:23:50 +0000"  >&lt;p&gt;As far as I can see, this would only show real benefit for mmapv1 storage on spinning disks &#8211; WiredTiger is COW anyway. And SSDs are highly encouraged for production (for a reason).&lt;/p&gt;

&lt;p&gt;So, forgive my french, this would only benefit users who are trying to save a few bucks by using spinning disks. And of those &quot;only&quot; the ones using the mmapv1 storage engine, since on WiredTiger the documents would not be in index order after the first change of a document any more.&lt;/p&gt;

&lt;p&gt;Changing the way chunk splits and migrations work would impact all users, most likely with negative side effects on the performance side. Admittedly, this would only impact collections with a bad shard key which causes frequent chunk migrations, but it presumably would make the situation worse.&lt;/p&gt;

&lt;p&gt;Another point is that sooner or later spinning disks will be replaced by SSDs. So this feature basically is added for being able to stick to a &quot;dying&quot; technology to save money. I am not sure wether that makes sense. From my experience, the utilization of spinning disks leads to the disk IO being the bottleneck of a deployment, requiring more shards to be added to handle the total load. In at least two instances I was able to reduce the number of shards of my customers by half by scaling up a few shard nodes to SSDs and increasing the RAM accordingly, saving a big deal of money each month.&lt;/p&gt;

&lt;p&gt;That being said: If in doubt, and for my part I am a great deal, adhering to KISS is the better choice here.&lt;/p&gt;</comment>
                            <comment id="397089" author="justanyone" created="Tue, 6 Aug 2013 17:06:05 +0000"  >&lt;p&gt;The average Linux user will have the default configuration of &apos;cfq&apos; IO scheduler.  This lends itself to sequential IO, but not to random IO, since it tries to delay requests until it can see if any of them are going to the same area of disk.  An experienced/advanced MongoDB user will configured their mounts as &apos;noop&apos; to speed things up (for data file writing, not for journal writing).  &lt;/p&gt;

&lt;p&gt;With this in mind, when migrating a chunk (thus cloning it), the data could be written/cloned by the receiving shard in index order, and written that way.  Thus, any chunk migrations would automatically rewrite the data in index order, solving this case.  This would speed IO since it would be a sequential write of more than one document at a time.  Ideally, the core server code would bunch up the write as a set sequential set of bytes as opposed to a document at a time, if this is not already the case.&lt;/p&gt;

&lt;p&gt;So, to solve this, just: &lt;/p&gt;

&lt;p&gt; (a) ensure that migrated chunk data is written in index order, and&lt;br/&gt;
 (b) stimulate the balancer to move all the chunks at least once.&lt;/p&gt;
</comment>
                            <comment id="176914" author="nick.gerner@gmail.com" created="Thu, 18 Oct 2012 22:51:33 +0000"  >&lt;p&gt;Any updates on this? I would love to have clustered indexes. I&apos;m currently running large batch inserts and trying to insert them in roughly my desired order on disk, but I&apos;ve got an index which is the actual order I need to do large range scans, so if that could be clustered, that would be even better.&lt;/p&gt;

&lt;p&gt;However, that index is not on the shard key in our case (we want to spread large reads over all the shards to get better parallelism and reduce latency). Does that mean you&apos;re not planning on supporting our scenario?&lt;/p&gt;</comment>
                            <comment id="38273" author="eliot" created="Mon, 20 Jun 2011 03:12:33 +0000"  >&lt;p&gt;We&apos;re planning on doing something similar to this, but not quite the same.&lt;br/&gt;
We&apos;re not going to do physical files per chunk for a few reasons: &lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;splits become expensive&lt;/li&gt;
	&lt;li&gt;for indexes, would either have to do 1 per file which means search is very expensive, or if you don&apos;t then that&apos;s the bottleneck&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="38229" author="bdarfler" created="Sun, 19 Jun 2011 17:29:15 +0000"  >&lt;p&gt;However those reads and writes could be sequential.&lt;/p&gt;</comment>
                            <comment id="38224" author="bdarfler" created="Sun, 19 Jun 2011 15:41:39 +0000"  >&lt;p&gt;This might be accomplished by separate files per chunk. It would then mean that a chunk split is more expensive as the chunk would have to be read in and written out based on the new key ranges.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="159638">SERVER-15354</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="439040">SERVER-31357</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="69626">SERVER-9114</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25136"><![CDATA[Storage Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 20 Jun 2011 03:12:33 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 48 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 48 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-execution</customfieldvalue>
            <customfieldvalue>bdarfler</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>geert.bosch@mongodb.com</customfieldvalue>
            <customfieldvalue>justanyone</customfieldvalue>
            <customfieldvalue>louis.williams@mongodb.com</customfieldvalue>
            <customfieldvalue>mdcallag</customfieldvalue>
            <customfieldvalue>markus.mahlberg@icloud.com</customfieldvalue>
            <customfieldvalue>nick.gerner@gmail.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hroxgv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2pr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6117</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht0u5b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>