<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:33:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-33556] range scan query optimizing</title>
                <link>https://jira.mongodb.org/browse/SERVER-33556</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;I&apos;m going to try and describe an IXSCAN performance optimization.&lt;/p&gt;

&lt;p&gt;Imagine a shard collection scenario.  This collection is using 2TB of storage on disk.  All queries to this collection will do index range scans on the shard key.  These range scans will very often query 10, 20, etc, multiple chunks on disk.  These queries will also have some regex or filters on parts of the document that are not in the shard key.  Basically once the document is found to have meet the shard key bounds, we always have to inspect the contents of the document to know if it should be returned or not.&lt;/p&gt;

&lt;p&gt;In scenarios like this one, mongo&apos;s query optimizer will have each replica set execute a IXSCAN operation to find and filter on the documents.  For performance reasons, I believe in scenarios like this one Mongo should always full collection scan as the chunk shard key bounds effectively make doing IXSCAN operations unnecessary.  We already know every document or a large portion of the documents in the chunk are going to have to be scanned.  In cases like this a COLLSCAN operation is far more efficient.&lt;/p&gt;

&lt;p&gt;I&apos;ve seen this behavior happen on range scans on shard keys on small percentages of the documents in a collection.  I&apos;ve seen the optimizer pick this behavior when the query bounds would target every document//chunk in a sharded collection as well.  In both of these cases full collection scanning is the best option.&lt;/p&gt;

&lt;p&gt;Ideally what I think should happen is:&lt;br/&gt;
1. Mongos figures out what data chunks have data for the range bounds of the query on the shard key like it currently does&lt;br/&gt;
2. Mongos sends the query down to mongod&lt;br/&gt;
3. Mongod&apos;s optimizer recognizes that a collection scan is more efficient and does that instead of an index scan&lt;/p&gt;

&lt;p&gt;If option 3 can&apos;t happen maybe a special query hint that isn&apos;t a full collection scan query hint, but a query hint that says, do a full data chunk scan on anything that is left after we filter out all the unnecessary data chunks using the bounds provided on the shard key.&lt;/p&gt;

&lt;p&gt;If you need more info or don&apos;t understand what I&apos;m trying to describe, I&apos;m happy to go into even more detail.&lt;/p&gt;</description>
                <environment></environment>
        <key id="503730">SERVER-33556</key>
            <summary>range scan query optimizing</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="kyle.suarez@mongodb.com">Kyle Suarez</assignee>
                                    <reporter username="mkruse@adobe.com">Matthew Kruse</reporter>
                        <labels>
                    </labels>
                <created>Wed, 28 Feb 2018 21:13:08 +0000</created>
                <updated>Mon, 23 Apr 2018 21:59:56 +0000</updated>
                            <resolved>Mon, 26 Mar 2018 13:53:41 +0000</resolved>
                                                                    <component>Querying</component>
                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>11</watches>
                                                                                                                <comments>
                            <comment id="1844918" author="kyle.suarez" created="Mon, 26 Mar 2018 13:53:41 +0000"  >&lt;p&gt;Hey &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=mkruse%40adobe.com&quot; class=&quot;user-hover&quot; rel=&quot;mkruse@adobe.com&quot;&gt;mkruse@adobe.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;In a sharded cluster, &lt;tt&gt;mongos&lt;/tt&gt; will perform shard targeting to target only those shards that contain chunks relevant for the query. After that, it forwards the command to those servers and it&apos;s up to them to decide what plan is best. I&apos;d say that &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-13065&quot; title=&quot;Consider a collection scan even if indexed plans are available&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-13065&quot;&gt;SERVER-13065&lt;/a&gt; would be the general-case solution for both sharded and unsharded setups, so I&apos;m going to close this as a duplicate. You can watch that ticket for updates.&lt;/p&gt;

&lt;p&gt;Thanks for taking the time to file this improvement and make MongoDB better &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Kyle&lt;/p&gt;</comment>
                            <comment id="1843369" author="mkruse@adobe.com" created="Fri, 23 Mar 2018 15:44:59 +0000"  >&lt;p&gt;Kyle, the issue you linked to this one is effectively the same problem.  The optimizer needs to do a better job of recognizing when to abandon a index range scan and skip to a collection scan in certain cases.  I could see this happening in sharded and unsharded mongo setups.&lt;/p&gt;

&lt;p&gt;Assuming mongod can do chunk pruning based off a shard key or an index range, the other bug you referenced is effectively the same problem.  If mongod behaves differently in this respect in a sharded or unsharded setup, then this issue is distinct.&lt;/p&gt;

&lt;p&gt;I think treating an unsharded setup&apos;s _id index as the &apos;shard key&apos; is the same thing as a sharded configuration as data chunks are built in ranges off of these in both cases.&lt;/p&gt;</comment>
                            <comment id="1843352" author="kyle.suarez" created="Fri, 23 Mar 2018 15:39:44 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=mkruse%40adobe.com&quot; class=&quot;user-hover&quot; rel=&quot;mkruse@adobe.com&quot;&gt;mkruse@adobe.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Even in non-sharded environments, it would make sense to consider collection scans when an index scan would be unselective, which is described in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-13065&quot; title=&quot;Consider a collection scan even if indexed plans are available&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-13065&quot;&gt;SERVER-13065&lt;/a&gt;. Would that satisfy your feature request? If so, I&apos;d like to close this ticket as a duplicate to track the improvement in one place.&lt;/p&gt;

&lt;p&gt;Best,&lt;br/&gt;
Kyle&lt;/p&gt;</comment>
                            <comment id="1821795" author="thomas.schubert" created="Fri, 2 Mar 2018 21:04:52 +0000"  >&lt;p&gt;Thanks for the improvement request, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=mkruse%40adobe.com&quot; class=&quot;user-hover&quot; rel=&quot;mkruse@adobe.com&quot;&gt;mkruse@adobe.com&lt;/a&gt;. I&apos;ve sent it to the Sharding Team for consideration.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="118011">SERVER-13065</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 2 Mar 2018 21:04:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 46 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>backlog-server-pm</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 46 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>kelsey.schubert@mongodb.com</customfieldvalue>
            <customfieldvalue>kyle.suarez@mongodb.com</customfieldvalue>
            <customfieldvalue>mkruse@adobe.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htrdev:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hra19r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htqzlb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>