<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:09:28 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-45689] DISTINCT_SCAN candidate plans should be generated and evaluated with the multi-planner</title>
                <link>https://jira.mongodb.org/browse/SERVER-45689</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The DISTINCT_SCAN stage implements an optimized specialization of an index scan which is appropriate for a subset of &lt;tt&gt;distinct&lt;/tt&gt; or &lt;tt&gt;aggregate&lt;/tt&gt; operations. It involves skipping duplicate keys via an index seek.&lt;/p&gt;

&lt;p&gt;The planning logic for DISTINCT_SCAN is implemented outside of the planner. It involves first invoking &lt;tt&gt;QueryPlanner::plan()&lt;/tt&gt;, and then seeing if any of the resulting plans can be correctly converted to a DISTINCT_SCAN. However, as soon as we are able to construct our first DISTINCT_SCAN plan, we pass it off to the execution engine without considering other candidates. See &lt;a href=&quot;https://github.com/mongodb/mongo/blob/48cd578fa9c3ef317666ca475f9ee14c1fe0bc4f/src/mongo/db/query/get_executor.cpp#L1535-L1556&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/blob/48cd578fa9c3ef317666ca475f9ee14c1fe0bc4f/src/mongo/db/query/get_executor.cpp#L1535-L1556&lt;/a&gt;. It is possible that there are multiple DISTINCT_SCAN plans, and that one will outperform another. The efficiency of the DISTINCT_SCAN relates to position of the field we&apos;re &quot;distincting&quot; in the index key pattern, as well as the number of unique values in the collection for the preceding key pattern fields. By simply selecting the first DISTINCT_SCAN, we might select a plan that is substantially suboptimal.&lt;/p&gt;

&lt;p&gt;Instead, we should generate a set of DISTINCT_SCAN candidate plans. These candidates could then be scored and ranked according to our usual multi-planning algorithm.&lt;/p&gt;

&lt;p&gt;A few additional concerns that come to mind:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Is there any reason to generate IXSCAN candidates alongside the DISTINCT_SCAN candidates, or is it safe to assume that DISTINCT_SCAN should always be preferred?&lt;/li&gt;
	&lt;li&gt;How hard would it be to move the DISTINCT_SCAN logic into &lt;tt&gt;QueryPlanner::plan()&lt;/tt&gt;? It&apos;s always bothered me that we have such heavy query planning logic living outside of the planning module.&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="1107958">SERVER-45689</key>
            <summary>DISTINCT_SCAN candidate plans should be generated and evaluated with the multi-planner</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="david.storch@mongodb.com">David Storch</reporter>
                        <labels>
                            <label>qopt-team</label>
                    </labels>
                <created>Tue, 21 Jan 2020 23:51:37 +0000</created>
                <updated>Wed, 4 Jan 2023 22:02:38 +0000</updated>
                                                                            <component>Querying</component>
                                        <votes>1</votes>
                                    <watches>23</watches>
                                                                                                                <comments>
                            <comment id="4941463" author="JIRAUSER1269325" created="Mon, 31 Oct 2022 17:46:20 +0000"  >&lt;p&gt;We are sending this back to the backlog and director triage for assignment &lt;/p&gt;</comment>
                            <comment id="4142895" author="JIRAUSER1258164" created="Fri, 22 Oct 2021 18:21:52 +0000"  >&lt;p&gt;Initial thoughts:&lt;/p&gt;


&lt;ul&gt;
	&lt;li&gt;Is there any reason to generate IXSCAN candidates alongside the DISTINCT_SCAN candidates, or is it safe to assume that DISTINCT_SCAN should always be preferred?
	&lt;ul&gt;
		&lt;li&gt;A DISTINCT_SCAN is a long series of seeks as I understand it, so it seems that in the case where the index is unique, or almost unique, you&apos;d end up doing a lot of unnecessary starting over at the top of the BTree, so at least in some cases, a IXSCAN could be more performant. I&apos;d have to do some perf testing to prove that though, or to find the number of concurrent records to show the difference.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
	&lt;li&gt;I&apos;d definitely prefer to push the DISTINCT_SCAN logic into the planner, just for consistency. That also makes them available for other situations that can take advantage of the distinct scan, in principle.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="4045805" author="charlie.swanson" created="Wed, 8 Sep 2021 19:52:39 +0000"  >&lt;p&gt;Flipping this back to triage based off &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=christopher.harris&quot; class=&quot;user-hover&quot; rel=&quot;christopher.harris&quot;&gt;christopher.harris&lt;/a&gt;&apos;s comment which should help &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=steve.la&quot; class=&quot;user-hover&quot; rel=&quot;steve.la&quot;&gt;steve.la&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bernard.gorman&quot; class=&quot;user-hover&quot; rel=&quot;bernard.gorman&quot;&gt;bernard.gorman&lt;/a&gt; make a more informed scheduling decision. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="262967">SERVER-22460</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="141518">SERVER-14227</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000xEqBYQA0, 5002K000011EuXAQA0]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 13 Feb 2020 18:40:11 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 14 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>james.wahlin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 14 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>brenda.rodriguez@mongodb.com</customfieldvalue>
            <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>joel.redman@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hwk1p3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2yzz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5911">QO 2022-09-19</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hwjnyf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>