<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:29:52 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-12865] MultiKey Cardinality issues in index intersection</title>
                <link>https://jira.mongodb.org/browse/SERVER-12865</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Consider the below Query&lt;br/&gt;
db.data.find({&lt;br/&gt;
...     date: {$gt: ISODate(&quot;1970-01-01T00:00:00.000Z&quot;)},&lt;br/&gt;
...         tags: {$all: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;#39;youtube&amp;#39;, &amp;#39;abc&amp;#39;&amp;#93;&lt;/span&gt;},&lt;br/&gt;
... }).limit(100).sort(&lt;/p&gt;
{date: -1}
&lt;p&gt;)&lt;/p&gt;

&lt;p&gt;with an index on tags + date. where date will be used for the sort portion.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Behavior in 2.4 :&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;The query would scan the index for &quot;youtube&quot; (because it is the first argument) and then scan that subsequent result set for &quot;abc&quot;. Now Imagine &quot;youtube&quot; is a tag in 10M documents, whereas &quot;abc&quot; is a tag in 5. Clearly, we would get much better performance by scanning for &quot;abc&quot; instead of &quot;youtube&quot; initially on the index. &lt;/p&gt;

&lt;p&gt;We can manipulate this in 2.4 if we know something about the cardinality of our data by just making &quot;abc&quot; the first argument in the $all. Many people have used this to successfully turn a 30sec query into a 10ms query. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;Behavior in 2.6.rc0:&lt;/b&gt;&lt;br/&gt;
The Query would intersect the index with itself, scanning the 5 &quot;abc&quot; docs, and scanning through as much of the &quot;youtube&quot; docs as necessary until it is sure it has all of the relevant documents for indexes (DiskLoc &amp;gt; last DiskLoc of &quot;abc&quot; docs). &lt;/p&gt;

&lt;p&gt;While this in effect should be much better than the 30sec query on average, it can still be as bad as that and on average will scan about half of the 10M documents that match youtube, which is much worse than our optimization of scanning for &quot;abc&quot; instead. Worse yet, it will yield wildly inconsistent performance, from as high as 30sec to as low as a few ms, even if we know the cardinality differences in our data.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;conclusion&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Naturally, turning intersection off if you know your data&apos;s cardinality makes sense for these kinds of queries so that you can cleverly structure them. But when you try to run the queries it seems the behavior has changed in 2.5+ releases. The first argument no longer guarantees that that value is what is used for the index. So swapping &quot;abc&quot; to be the first arg may yield a 10ms query or a 30s query. It is no longer deterministic. &lt;/p&gt;

&lt;p&gt;Essentially an existing workaround that people were using has been removed and replaced with an easier, but less effective solution to this problem (index intersection). &lt;/p&gt;

&lt;p&gt;Clearly this problem would be best addressed by having some sort of cardinality histogram about our data in the DB. But given that that may be a few releases away, having some way to deterministically decide what value to scan the index on after we opt out of index intersection would be nice. Argument order as it was previously is probably the best option. &lt;/p&gt;

&lt;p&gt;I also imagine this same problem exists for queries like &lt;/p&gt;

&lt;p&gt;db.data.find( { a : &lt;/p&gt;
{ $lt : 10}
&lt;p&gt; , b : &lt;/p&gt;
{ $gt : 20 }
&lt;p&gt;   })&lt;/p&gt;

&lt;p&gt;If there are millions of records with a&amp;lt;10 but only a handful with b&amp;gt;20 then it would be much faster to simply scan b and then it&apos;s subset rather than perform an intersection on the result sets of a &amp;amp; b. &lt;/p&gt;



</description>
                <environment></environment>
        <key id="114089">SERVER-12865</key>
            <summary>MultiKey Cardinality issues in index intersection</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="osmar.olivo">Osmar Olivo</reporter>
                        <labels>
                    </labels>
                <created>Mon, 24 Feb 2014 19:43:28 +0000</created>
                <updated>Wed, 10 Dec 2014 23:05:41 +0000</updated>
                            <resolved>Sun, 2 Mar 2014 21:46:13 +0000</resolved>
                                    <version>2.6.0-rc0</version>
                                                    <component>Querying</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="504299" author="scotthernandez" created="Mon, 24 Feb 2014 19:53:29 +0000"  >&lt;p&gt;Can you simplify this and remove the duplicate parts that overlap with &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-12499&quot; title=&quot;Unable to force predicate evaluation order with new query framework&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-12499&quot;&gt;&lt;del&gt;SERVER-12499&lt;/del&gt;&lt;/a&gt; please? If it is all duplicated, please close it as such.&lt;/p&gt;

&lt;p&gt;Correct me if I&apos;m wrong, but this sounds like it boils down to: The query optimizer needs stats on cardinality (selectivity) of values to pick the optimal indexes...&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="107879">SERVER-12499</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 24 Feb 2014 19:53:29 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        9 years, 51 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            9 years, 51 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>osmar.olivo</customfieldvalue>
            <customfieldvalue>scotthernandez</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrm0tj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrwcnb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>102753</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsh5ev:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>