<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:07:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-24981] $project-$limit optimization has bad repercussion on pipeline splitting</title>
                <link>https://jira.mongodb.org/browse/SERVER-24981</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The new &lt;tt&gt;$project-$limit&lt;/tt&gt; optimization in 3.2 might make the pipeline to be split much earlier than before (because it will split the pipeline at the limit step).&lt;/p&gt;

&lt;p&gt;I&apos;m attaching two explain plan of queries, one which uses the optimization and one that doesn&apos;t because I added a &lt;tt&gt;$redact: $$KEEP&lt;/tt&gt; just before the &lt;tt&gt;$limit&lt;/tt&gt;.&lt;br/&gt;
In the case of this query much more fields are sent to the &lt;tt&gt;mergerPart&lt;/tt&gt; because of the splitting and is triggering a very bad behavior with second batches of aggregation queries which will be described in another ticket.&lt;/p&gt;

&lt;p&gt;I think it would be good to take into consideration pipeline splitting when doing those optimization (in addition there is no &lt;tt&gt;$sort&lt;/tt&gt; stage which would benefit from having the &lt;tt&gt;$limit&lt;/tt&gt; moved up)&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
Antoine&lt;/p&gt;
</description>
                <environment></environment>
        <key id="299977">SERVER-24981</key>
            <summary>$project-$limit optimization has bad repercussion on pipeline splitting</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="janna.golden@mongodb.com">Janna Golden</assignee>
                                    <reporter username="antoine.hom@amadeus.com">Antoine Hom</reporter>
                        <labels>
                            <label>performance</label>
                    </labels>
                <created>Mon, 11 Jul 2016 12:54:39 +0000</created>
                <updated>Tue, 17 Apr 2018 07:05:30 +0000</updated>
                            <resolved>Thu, 7 Dec 2017 21:39:02 +0000</resolved>
                                                    <fixVersion>3.7.1</fixVersion>
                                    <component>Aggregation Framework</component>
                                        <votes>0</votes>
                                    <watches>13</watches>
                                                                                                                <comments>
                            <comment id="1745993" author="xgen-internal-githook" created="Thu, 7 Dec 2017 20:58:37 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{&apos;name&apos;: &apos;jannaerin&apos;, &apos;username&apos;: &apos;jannaerin&apos;, &apos;email&apos;: &apos;golden.janna@gmail.com&apos;}
&lt;p&gt;Message: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-24981&quot; title=&quot;$project-$limit optimization has bad repercussion on pipeline splitting&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-24981&quot;&gt;&lt;del&gt;SERVER-24981&lt;/del&gt;&lt;/a&gt; Rewrite $limit optimization&lt;br/&gt;
Branch: master&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/bbebcbfde994ec14b9fabfe17779cfb5adcda211&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/bbebcbfde994ec14b9fabfe17779cfb5adcda211&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="1697805" author="david.storch" created="Thu, 12 Oct 2017 21:44:22 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=tess.avitabile&quot; class=&quot;user-hover&quot; rel=&quot;tess.avitabile&quot;&gt;tess.avitabile&lt;/a&gt; this sounds reasonable to me. I think for now we can move this out to 3.7 Desired, but this could be a good thing for &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=janna.golden&quot; class=&quot;user-hover&quot; rel=&quot;janna.golden&quot;&gt;janna.golden&lt;/a&gt; to work on after she has a little bit of ramp time on the query team.&lt;/p&gt;</comment>
                            <comment id="1695925" author="tess.avitabile" created="Wed, 11 Oct 2017 15:13:51 +0000"  >&lt;p&gt;I like &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=charlie.swanson&quot; class=&quot;user-hover&quot; rel=&quot;charlie.swanson&quot;&gt;charlie.swanson&lt;/a&gt;&apos;s suggestion to have &lt;tt&gt;$sort&lt;/tt&gt; look ahead in the pipeline past stages that preserve the number of documents for a &lt;tt&gt;$limit&lt;/tt&gt; to coalesce with (where by ahead, I mean &lt;tt&gt;[{$sort: ...}, ..., {$limit: ...}]&lt;/tt&gt;). There is no benefit to swapping &lt;tt&gt;$limit&lt;/tt&gt; before &lt;tt&gt;$project&lt;/tt&gt; except when it can find a &lt;tt&gt;$sort&lt;/tt&gt; to coalesce with. And there is no harm in swapping &lt;tt&gt;$limit&lt;/tt&gt; before &lt;tt&gt;$project&lt;/tt&gt; when there is a &lt;tt&gt;$sort&lt;/tt&gt; earlier in the pipeline, because the pipeline will be split at &lt;tt&gt;$sort&lt;/tt&gt;, so the &lt;tt&gt;$project&lt;/tt&gt; would not be performed on the shards anyway. &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=asya&quot; class=&quot;user-hover&quot; rel=&quot;asya&quot;&gt;asya&lt;/a&gt;&apos;s suggestion to duplicate the &lt;tt&gt;$limit&lt;/tt&gt; when there is an intervening stage that increases the number of documents seems like a good extension.&lt;/p&gt;

&lt;p&gt;I do not think we can say whether it is always better/worse to swap &lt;tt&gt;$skip&lt;/tt&gt; before &lt;tt&gt;$project&lt;/tt&gt;. On a single shard, it is clearly always better. But in a sharded cluster, it depends on the expensiveness of the &lt;tt&gt;$project&lt;/tt&gt; vs. the &lt;tt&gt;$project&lt;/tt&gt;&apos;s reduction of the document size. Since we cannot determine whether a swap is an improvement, and there are no reported issues about the current optimization, I recommend we leave it as is.&lt;/p&gt;</comment>
                            <comment id="1380824" author="charlie.swanson" created="Fri, 9 Sep 2016 19:30:32 +0000"  >&lt;p&gt;I have one idea of how to fix this:&lt;br/&gt;
Our current optimization puts $limit in front of $project in hopes that it will later coalesce with a $sort. Instead, we could have $sort be responsible for looking ahead in the pipeline to try to find a $limit. The $sort could keep looking past anything like $project, $addFields, etc. that do not change the number of documents in the pipeline. This will keep as much work on the shards in parallel, and still allow $sort to still find a $limit.&lt;/p&gt;

&lt;p&gt;The $skip optimization might suffer from a similar problem to the one described here, and I&apos;m not sure if/how we want to address that. The $skip/$project swap was meant to reduce the amount of work done transforming documents within $project. I&apos;m tempted to think that this is still a worthwhile optimization. If so, we&apos;d want to add some special logic after splitting the pipeline to see if the next stage(s) is a $project (or again something like $addFields). If there is at least one such stage, we can move it/them back to the parallel part of the shards.&lt;/p&gt;

&lt;p&gt;If we do that second piece of work, we might not need to do the first, since the same strategy would work for $limit.&lt;/p&gt;</comment>
                            <comment id="1320229" author="ramon.fernandez" created="Mon, 11 Jul 2016 16:54:46 +0000"  >&lt;p&gt;Thanks for your reports &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=antoine.hom%40amadeus.com&quot; class=&quot;user-hover&quot; rel=&quot;antoine.hom@amadeus.com&quot;&gt;antoine.hom@amadeus.com&lt;/a&gt;, both &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-24978&quot; title=&quot;Second batches in aggregation framework are asked synchronously&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-24978&quot;&gt;&lt;del&gt;SERVER-24978&lt;/del&gt;&lt;/a&gt; and this ticket have been sent to the Query team for consideration. Please continue to watch both tickets for updates.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1319813" author="antoine.hom@amadeus.com" created="Mon, 11 Jul 2016 13:03:28 +0000"  >&lt;p&gt;The query without redact timed out in 10+ minutes in our cluster. (because of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-24978&quot; title=&quot;Second batches in aggregation framework are asked synchronously&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-24978&quot;&gt;&lt;del&gt;SERVER-24978&lt;/del&gt;&lt;/a&gt;)&lt;br/&gt;
The one with the redact step finished in 1minute.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10420">
                    <name>Backports</name>
                                            <outwardlinks description="backported by">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10320">
                    <name>Documented</name>
                                                                <inwardlinks description="is documented by">
                                        <issuelink>
            <issuekey id="469624">DOCS-11102</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="299978">SERVER-24978</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="130247" name="explain_plan.log" size="765373" author="antoine.hom@amadeus.com" created="Mon, 11 Jul 2016 12:55:25 +0000"/>
                            <attachment id="130248" name="explain_plan_redact.log" size="1284693" author="antoine.hom@amadeus.com" created="Mon, 11 Jul 2016 12:55:25 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_12450" key="com.atlassian.jira.plugin.system.customfieldtypes:multicheckboxes">
                        <customfieldname>Backport Requested</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="15141"><![CDATA[v3.6]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10038"><![CDATA[Fully Compatible]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 11 Jul 2016 13:24:20 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        6 years, 9 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>joey</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            6 years, 9 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>antoine.hom@amadeus.com</customfieldvalue>
            <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>xgen-internal-githook</customfieldvalue>
            <customfieldvalue>janna.golden@mongodb.com</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
            <customfieldvalue>tess.avitabile@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrk33r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr9c4n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="1952">Query 2017-11-13</customfieldvalue>
    <customfieldvalue id="1979">Query 2017-12-04</customfieldvalue>
    <customfieldvalue id="2034">Query 2017-12-18</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs9w7r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>