<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:20:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-9507] Optimize $sort+$group+$first pipeline to avoid full index scan</title>
                <link>https://jira.mongodb.org/browse/SERVER-9507</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;This is an analogue to &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-2094&quot; title=&quot;distinct cheat with indexes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-2094&quot;&gt;&lt;del&gt;SERVER-2094&lt;/del&gt;&lt;/a&gt; (&quot;distinct cheat with indexes&quot;), but for the aggregation framework.&lt;/p&gt;

&lt;p&gt;This performance improvement is to allow $group operators like $first to be able to take advantage of the fact that the input to the pipeline is sorted, and thus reduce the number of index entries scanned by &quot;skipping&quot; processing of large portions of the pipeline.&lt;/p&gt;

&lt;p&gt;For example, suppose a user has a collection with an index {&lt;tt&gt;x:1,y:1&lt;/tt&gt;}, and that &lt;tt&gt;x&lt;/tt&gt; has low cardinality.  Consider the following pipeline:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;db.foo.aggregate({$sort:{x:1,y:1}},{$group:{_id:{x:&lt;/span&gt;&lt;span style=&quot;color: blue; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;$x&quot;&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;},y:{$first:&lt;/span&gt;&lt;span style=&quot;color: blue; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;$y&quot;&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;}}})&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;Currently, the above pipeline will perform a full scan of the index.  After this optimization, the above pipeline will only have to scan on the order of &lt;tt&gt;|x|&lt;/tt&gt; index entries, which is much smaller than the size of the index.&lt;/p&gt;

&lt;p&gt;This ticket is filed as a result of discussion in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9272&quot; title=&quot;Querying latest document based on a set of field&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9272&quot;&gt;&lt;del&gt;SERVER-9272&lt;/del&gt;&lt;/a&gt; (full use case available there).&lt;/p&gt;</description>
                <environment></environment>
        <key id="73561">SERVER-9507</key>
            <summary>Optimize $sort+$group+$first pipeline to avoid full index scan</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="justin.seyster@mongodb.com">Justin Seyster</assignee>
                                    <reporter username="backlog-server-query">Backlog - Query Team</reporter>
                        <labels>
                            <label>4.1.3</label>
                            <label>asya</label>
                            <label>mock-pm</label>
                            <label>optimization</label>
                            <label>performance</label>
                    </labels>
                <created>Mon, 29 Apr 2013 22:10:37 +0000</created>
                <updated>Wed, 7 Sep 2022 14:02:03 +0000</updated>
                            <resolved>Wed, 26 Sep 2018 20:00:26 +0000</resolved>
                                    <version>2.4.3</version>
                                    <fixVersion>4.1.4</fixVersion>
                                    <component>Aggregation Framework</component>
                                        <votes>9</votes>
                                    <watches>26</watches>
                                                                                                                <comments>
                            <comment id="2015443" author="xgen-internal-githook" created="Wed, 26 Sep 2018 19:59:37 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;Justin Seyster&apos;, &apos;email&apos;: &apos;justin.seyster@mongodb.com&apos;, &apos;username&apos;: &apos;jseyster&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-9507&quot; title=&quot;Optimize $sort+$group+$first pipeline to avoid full index scan&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-9507&quot;&gt;&lt;del&gt;SERVER-9507&lt;/del&gt;&lt;/a&gt; Optimize $sort+$group+$first pipeline to use DISTINCT_SCAN&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/da63195cc421f8f29c1c0bef5fa2c2226d230dfd&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/da63195cc421f8f29c1c0bef5fa2c2226d230dfd&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="2009156" author="ian@10gen.com" created="Thu, 20 Sep 2018 14:22:59 +0000"  >&lt;p&gt;Target date: definitely end of this sprint (8 weeks).&lt;br/&gt;
&quot;even with build baron&quot; - Justin&lt;/p&gt;</comment>
                            <comment id="1995101" author="ian@10gen.com" created="Thu, 6 Sep 2018 14:18:39 +0000"  >&lt;p&gt;Target date of: end of this sprint. (6 weeks)&lt;/p&gt;</comment>
                            <comment id="326969" author="rassi@10gen.com" created="Thu, 2 May 2013 20:45:18 +0000"  >&lt;p&gt;Correct, the aggregation framework currently cannot use an index to help optimize those pipelines (which is unrelated to this ticket).  If an index cannot be used to satisfy a $sort, then an in-memory sort is performed, in which case all documents in the pipeline have to be examined anyway (so no significant performance improvement can be made).  If an index can be used to satisfy a $sort, and only a small subset of documents are needed by a later pipeline stage (in a way that the sort order can be employed), then the optimization suggested here will drastically reduce the number of index entries scanned.&lt;/p&gt;</comment>
                            <comment id="324580" author="mnsndrs" created="Tue, 30 Apr 2013 03:02:39 +0000"  >&lt;p&gt;What if, prior to this set of operators, I would have to perform other pipeline operators such as $project and $unwind? As far as I know, indexes can no longer be used after the transformation.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="585287">SERVER-36517</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10320">
                    <name>Documented</name>
                                                                <inwardlinks description="is documented by">
                                        <issuelink>
            <issuekey id="947125">DOCS-13065</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="71051">SERVER-9272</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="433161">SERVER-31269</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="613671">SERVER-37459</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="26715">SERVER-4507</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="279826">SERVER-23732</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="377780">SERVER-28980</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="622983">SERVER-37715</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="609316">SERVER-37304</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="715512">SERVER-40090</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1660757">SERVER-55576</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="2127296">SERVER-69359</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="13767">SERVER-2130</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="351626">SERVER-27915</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="13691">SERVER-2094</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="158725">SERVER-15291</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="384237">SERVER-29244</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[500A000000bxdd1IAA, 5002K00000f1ovpQAA, 500A000000WjbLPIAZ, 5002K00000fMavpQAC, 5002K00000dP25nQAC]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 30 Apr 2013 03:02:39 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 20 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_17052" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Downstream Changes Summary</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>The section titled &amp;quot;Pipeline Operators and Indexes&amp;quot; from &lt;a href=&quot;https://docs.mongodb.com/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes&quot;&gt;https://docs.mongodb.com/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes&lt;/a&gt; should be updated due to this change. It currently lists $match, $sort, and $geoNear as eligible to result in index use. However, this list is not exhaustive. Due to this change, a pipeline with a $group at the beginning can use an index, even if there is no $sort stage.&lt;br/&gt;
&lt;br/&gt;
In addition to documenting this new behavior, I suggest that we change the language of this page so that it does not claim to be exhaustive. That is, it should not say &amp;quot;Stages X, Y, and Z can result in index use.&amp;quot; Instead, it should say something like &amp;quot;MongoDB&amp;#39;s query planner analyzes an aggregation pipeline in order to determine whether indexes can be used to accelerate the operation. For example, an index can be used for filtering if a $match is at the beginning of the pipeline, or can be moved to the beginning of the pipeline by the optimizer. Similarly, a $sort at the beginning of the pipeline can be computed by scanning an index in order. As a final example, $group stages which obtain the distinct values of a field can use an index for the distinct operation if they occur at the beginning of the pipeline.&amp;quot;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_17050" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Downstream Team Attention</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16942"><![CDATA[Needed]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-1126</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>yuan.fang@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 20 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-query</customfieldvalue>
            <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>ian@mongodb.com</customfieldvalue>
            <customfieldvalue>rassi</customfieldvalue>
            <customfieldvalue>justin.seyster@mongodb.com</customfieldvalue>
            <customfieldvalue>mnsndrs</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrmv7z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr8jmn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7125</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="2301">Query 2018-06-04</customfieldvalue>
    <customfieldvalue id="2440">Query 2018-08-13</customfieldvalue>
    <customfieldvalue id="2441">Query 2018-08-27</customfieldvalue>
    <customfieldvalue id="2466">Query 2018-09-10</customfieldvalue>
    <customfieldvalue id="2467">Query 2018-09-24</customfieldvalue>
    <customfieldvalue id="2487">Query 2018-10-08</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_17051" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Teams Impacted</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="16944"><![CDATA[Docs]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrjd7j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>