<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:54:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-20616] Plan ranker sampling from the beginning of a query&apos;s execution can result in poor plan selection</title>
                <link>https://jira.mongodb.org/browse/SERVER-20616</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The query engine currently selects a winning plan among &lt;em&gt;n&lt;/em&gt; candidate plans by the following mechanism. We start executing each of the &lt;em&gt;n&lt;/em&gt; plans for some short trial period. Once the trial period ends, we use stats collected from the brief trial execution to score the plans. The most highly scored plan becomes the winner. The scoring is based on a notion of &quot;productivity&quot;, essentially the ratio of results produced per amount of work (where work is an artificial unit that is a proxy for CPU + IO time).&lt;/p&gt;

&lt;p&gt;You can think of the trial execution period as a way to sample the data on the fly, a kind of adaptive query processing technique in which inferences about the data are made while the query is executing. The problem is that this sampling strategy may not accurately represent the true data distribution.&lt;/p&gt;

&lt;p&gt;For example, suppose that some candidate plan &lt;em&gt;p1&lt;/em&gt; is scanning the index {a: 1} over the interval [0, 100000] in order to answer the query predicate {a: {$gte: 0, $lte: 100000}, b: &quot;selectiveString&quot;}.  There is an alternative plan &lt;em&gt;p2&lt;/em&gt; which simply looks up all docs where &quot;b&quot; is equal to some rare string, &quot;selectiveString&quot;, in the {b: 1} index. Clearly &lt;em&gt;p2&lt;/em&gt; is the better plan: instead of scanning some large index range and filtering out all the documents without the correct value for &quot;b&quot;, we do a lookup based on the more selective predicate.&lt;/p&gt;

&lt;p&gt;However, depending on the data, the plan ranking algorithm described above might lead us to choose suboptimal plan &lt;em&gt;p1&lt;/em&gt; instead of &lt;em&gt;p2&lt;/em&gt;. Suppose that there is a correlation between the &quot;a&quot; and &quot;b&quot; fields so that all of the documents where &quot;b&quot; is &quot;selectiveString&quot; also have an &quot;a&quot; value of 0. Since &lt;em&gt;p1&lt;/em&gt; is scanning index {a: 1} in order, we encounter all the documents where &quot;a&quot; is 0 and &quot;b&quot; is &quot;selectiveString&quot; first during the trial execution period. This can spuriously make &lt;em&gt;p1&lt;/em&gt; look as good as &lt;em&gt;p2&lt;/em&gt;! If instead we were to sample randomly from the data where &quot;a&quot; is on the interval [0, 10000], the query engine would have quickly observed that plan &lt;em&gt;p1&lt;/em&gt; is much slower than &lt;em&gt;p2&lt;/em&gt;. Real data is often correlated, so despite the fairly contrived example, this can be a problem is practice.&lt;/p&gt;</description>
                <environment></environment>
        <key id="231382">SERVER-20616</key>
            <summary>Plan ranker sampling from the beginning of a query&apos;s execution can result in poor plan selection</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="david.storch@mongodb.com">David Storch</reporter>
                        <labels>
                            <label>bonsai</label>
                    </labels>
                <created>Thu, 24 Sep 2015 17:38:10 +0000</created>
                <updated>Thu, 30 Nov 2023 11:22:08 +0000</updated>
                                            <version>2.6.11</version>
                    <version>3.0.6</version>
                    <version>3.2.0</version>
                                                    <component>Querying</component>
                                        <votes>11</votes>
                                    <watches>55</watches>
                                                                                                                <comments>
                            <comment id="4761197" author="JIRAUSER1265607" created="Thu, 18 Aug 2022 17:18:51 +0000"  >&lt;p&gt;can be reproduced by:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;coll = client.test.indextie&lt;/p&gt;

&lt;p&gt;coll.delete_many({})&lt;/p&gt;

&lt;p&gt;indexes = coll.index_information()&lt;br/&gt;
for key in indexes.keys():&lt;br/&gt;
&#160; &#160; if key != &quot;&lt;em&gt;id&lt;/em&gt;&quot;:&lt;br/&gt;
&#160; &#160; &#160; &#160; coll.drop_index(key)&lt;/p&gt;

&lt;p&gt;coll.create_index(&lt;span class=&quot;error&quot;&gt;&amp;#91;(&amp;quot;a&amp;quot;, pymongo.ASCENDING)&amp;#93;&lt;/span&gt;)&lt;br/&gt;
coll.create_index(&lt;span class=&quot;error&quot;&gt;&amp;#91;(&amp;quot;b&amp;quot;, pymongo.ASCENDING)&amp;#93;&lt;/span&gt;)&lt;/p&gt;

&lt;p&gt;data = []&lt;/p&gt;

&lt;p&gt;for i in range(1000):&lt;br/&gt;
&#160; &#160; data.append({&quot;a&quot;: 1, &quot;b&quot;: 1})&lt;/p&gt;

&lt;p&gt;for i in range(1000):&lt;br/&gt;
&#160; &#160; data.append({&quot;a&quot;: 1, &quot;b&quot;: i})&lt;/p&gt;

&lt;p&gt;coll.insert_many(data)&lt;br/&gt;
print(coll.find({&quot;a&quot;: 1, &quot;b&quot;: 1}).explain())&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;it should use index b but it a was used because it is created first.&lt;/p&gt;</comment>
                            <comment id="1207561" author="hoyt ren" created="Fri, 18 Mar 2016 05:37:02 +0000"  >&lt;p&gt;I tried v3.2.4, still see the problem.&lt;/p&gt;</comment>
                            <comment id="1124773" author="hoyt ren" created="Sun, 3 Jan 2016 09:49:44 +0000"  >&lt;p&gt;I tried v3.2, the problem is still there.&lt;/p&gt;</comment>
                            <comment id="1106696" author="hoyt ren" created="Mon, 7 Dec 2015 02:36:08 +0000"  >&lt;p&gt;Yes, it&apos;s a good direction to try. I believe the difficult is that don&apos;t affect the query performance much. Maybe, we can use (or put) some statics in (or into) index?&lt;/p&gt;

&lt;p&gt;By the way, I found the issue about manually control the selection of index, but it seems a long time that nobody update it.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-7944&quot; title=&quot;add index hint support for operations that read indexes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-7944&quot;&gt;&lt;del&gt;SERVER-7944&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="1105793" author="david.storch" created="Fri, 4 Dec 2015 18:07:39 +0000"  >&lt;p&gt;Here&apos;s one idea of how to fix this problem. We could extend the MultiPlanStage / CachedPlanStage trial period indefinitely. Instead of buffering the first batch of results and then throwing them out if we have to re-plan the query, we could keep a running estimate of index selectivity. If at any point during query execution the selectivity drops off, we could trigger a replan. This would probably involve buffering RecordIds for every document in the reset set so that we can dedup after a replan.&lt;/p&gt;

&lt;p&gt;There would be a few details to work out. For example, how many RecordIds would we be willing to store for each query? How would we deal with sorts? We would probably have to remember the sort key of the last document returned and throw out any results from the new plan on the wrong side of the sort.&lt;/p&gt;</comment>
                            <comment id="1091355" author="hoyt ren" created="Wed, 18 Nov 2015 02:26:27 +0000"  >&lt;p&gt;Hello friends,&lt;/p&gt;

&lt;p&gt;Since making an AI isn&apos;t easy, I see some people here mentioned, that allow manually selecting the index as traditional  DB, I can&apos;t remember the issue ID. Will this be a quicker fix? I believe here isn&apos;t problem because all us use SQL in so long a time if we ask user input some additional parameters. However if mongo achieve an AI, it will be a great achievement as many others we can see from today&apos;s mongo. Really thank you for your exciting works.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                                        </outwardlinks>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="236525">SERVER-21178</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="2485075">SERVER-82548</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="120781">SERVER-13211</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1276498">SERVER-46904</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="231398">SERVER-20619</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[500A000000ZdPiwIAF, 500A000000VmkpZIAR, 500A000000UabERIAZ, 500A000000UabroIAB, 500A000000XVis0IAD, 500A000000ZQQ55IAH, 5002K00000eAI0NQAW, 5002K00000e98g5QAA, 5002K00000fGEKbQAO, 5002K00000fHUpzQAG, 5002K00000gm1k1QAA, 5002K00000jby6oQAA, 5002K00000kE9UoQAK, 5002K00000msTiRQAU, 5002K00000mqZjkQAE, 5002K00000nnAe6QAE, 5002K00000ocJKdQAM, 5002K00000r3obiQAA, 5002K00000r58FfQAI, 5002K00000xkQE8QAM, 5002K00000yVR6yQAG, 5002K0000130Y0BQAU, 5006R00001lRaVdQAK, 5006R00001me9eDQAQ, 5006R00001nLXMpQAO, 5006R00001pm1fAQAQ, 5006R00001r8xsZQAQ, 5006R00001sC0xpQAC, 5006R00001srFDFQA2, 5006R00001uLtUHQA0]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 18 Nov 2015 02:26:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 24 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/TSEXP-1758'>TSEXP-1758</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ivan.fefer@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 24 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>Hoyt Ren</customfieldvalue>
            <customfieldvalue>xiaochen.wu@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrktmv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2lif:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsfmjr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>