<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:04:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-43878] Hint to perform a storage-order document scan</title>
                <link>https://jira.mongodb.org/browse/SERVER-43878</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;On rotating media and low-IOPS managed storage, we find that a low-selectivity query into a table performs very poorly when data is uncached, essentially using a random read for every six or so documents (I&apos;ve measured this at about 1000 documents/second on an HDD with 6ms access time).&#160; In a rather lengthy support case with MongoDB, they explain that even COLLSCAN does not proceed in storage order, and there appears to be no way to hint the query engine into performing a truly sequential read.&#160;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;However, while I agree that IXSCANs will not be performant for your situation of returning 1/3 of a collection from an environment with low random IOPS and high serial throughput, I am not sure&#160;&lt;tt&gt;$natural&lt;/tt&gt;&#160;will be faster if you are using WiredTiger. As Chris said above, given WT can place new documents randomly into any free space in a collections data file, and file blocks can be randomly placed on drives,&#160;&lt;tt&gt;$natural&lt;/tt&gt;&#160;order is by no means equivalent to disk order.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&#160;While the hint($natural:1) can steer a query from IXSCAN to COLLSCAN, this doesn&apos;t really make a difference.&#160; At least, when we try this using the Java driver&apos;s find(...).sort(&quot;$natural&quot;,1) equivalent, it makes no difference in performance.&lt;/p&gt;

&lt;p&gt;Thus, it would be desirable to have a hint that explicitly requests &quot;storage order collection scan&quot;, to leverage the much higher sequential throughput on some storage media.&lt;/p&gt;</description>
                <environment></environment>
        <key id="964369">SERVER-43878</key>
            <summary>Hint to perform a storage-order document scan</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-storage-engines">Backlog - Storage Engines Team</assignee>
                                    <reporter username="john.lilley@redpointglobal.com">John Lilley</reporter>
                        <labels>
                    </labels>
                <created>Tue, 8 Oct 2019 13:12:41 +0000</created>
                <updated>Wed, 29 Mar 2023 15:16:15 +0000</updated>
                                            <version>3.6.12</version>
                                                    <component>Querying</component>
                                        <votes>0</votes>
                                    <watches>15</watches>
                                                                                                                <comments>
                            <comment id="3426470" author="john.lilley@redpointglobal.com" created="Mon, 5 Oct 2020 20:21:40 +0000"  >&lt;p&gt;I agree with Keith, I would prefer to have MongoDB just figure it out and choose the optimal access pattern.&#160; I originally filed this issue after noticing that queries which utilized a sequential-scan plan, or which returned a lot of documents, did not seem to exploit the much faster sequential access pattern typical of managed storage and spinning media.&#160; Oddly, an index build &lt;b&gt;does&lt;/b&gt; use fast sequential access, so it seems possible.&lt;/p&gt;

&lt;p&gt;However, I don&apos;t think that triggering this behavior needs to be a hint of &quot;expects to return a lot of documents&quot;.&#160; Wouldn&apos;t any query plan that results in a sequential collection scan potentially benefit from this optimization given the right combination of cache memory, sequential access performance, and random access performance?&#160; Why not have MongoDB automatically tune its access pattern based on observed storage performance, something like another facet of the query optimizer, but instead a &quot;collection scan optimizer&quot;?&lt;/p&gt;</comment>
                            <comment id="3426277" author="keith.smith" created="Mon, 5 Oct 2020 19:12:36 +0000"  >&lt;p&gt;I&apos;d suggest that an application-provided hint should describe the application&apos;s needs or expectations, rather than directing the storage system&apos;s implementation. &#160;I.e., perhaps what is relevant here is that the application expects a query to return a large fraction of the documents in a collection and it doesn&apos;t care what order they are returned in. &#160;This would let MongoDB and/or WiredTiger determine the best way to optimize for the what the application is going to do, possibly taking into account things that aren&apos;t visible to the application (e.g., other system activity, cache pressure, parallelism in the underlying storage system, etc.)&lt;/p&gt;</comment>
                            <comment id="2478226" author="john.lilley@redpointglobal.com" created="Fri, 11 Oct 2019 16:48:39 +0000"  >&lt;p&gt;Thanks.&#160; One other thing I have observed: when a new index is created on a populated collection, I can watch the disk I/O profile and that process definitely uses a more efficient sequential scan.&#160; For example, on a given table with slow SSD backing it, the index-creation scan might show 100MB/sec but the table query might show 15MB/sec.&lt;/p&gt;</comment>
                            <comment id="2478145" author="daniel.hatcher" created="Fri, 11 Oct 2019 16:09:33 +0000"  >&lt;p&gt;Thank you for opening this ticket. I&apos;ll pass this along to the appropriate team to consider.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25135"><![CDATA[Storage Engines]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000hQrS7QAK, 5006R00001rZOp0QAG]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 10 Oct 2019 16:40:54 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        3 years, 18 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            3 years, 18 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-storage-engines</customfieldvalue>
            <customfieldvalue>daniel.hatcher@mongodb.com</customfieldvalue>
            <customfieldvalue>john.lilley@redpointglobal.com</customfieldvalue>
            <customfieldvalue>keith.smith@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvwlxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr708v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10555" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Story Points</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvw86v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>