<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:06:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-4566] In new aggregation framework, $sort, $limit in the pipeline seems loading all the matched data into memory. When we tried to improve the performance by leveraging multi thread aggregation, this makes it much slower than single thread</title>
                <link>https://jira.mongodb.org/browse/SERVER-4566</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Overall new aggregation framework is faster than map-reduce in our application. But when I tried to further improve the performance by sending multiple aggregation commands in multiple threads, then get the final aggregation done in app space, I noticed each thread will load lot of data to memory which makes each command take minutes to finish. (I do have index on sort key)&lt;/p&gt;</description>
                <environment>OSX 10.6, Java, mongodb is built from Dec. 15 master branch.</environment>
        <key id="27314">SERVER-4566</key>
            <summary>In new aggregation framework, $sort, $limit in the pipeline seems loading all the matched data into memory. When we tried to improve the performance by leveraging multi thread aggregation, this makes it much slower than single thread</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="cwestin">Chris Westin</assignee>
                                    <reporter username="xiaofeng">Xiaofeng Wu</reporter>
                        <labels>
                    </labels>
                <created>Tue, 27 Dec 2011 23:24:02 +0000</created>
                <updated>Mon, 11 Jul 2016 18:35:41 +0000</updated>
                            <resolved>Mon, 30 Jan 2012 19:43:49 +0000</resolved>
                                    <version>2.1.1</version>
                                                    <component>Aggregation Framework</component>
                                        <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="83428" author="cwestin" created="Mon, 30 Jan 2012 19:43:49 +0000"  >&lt;p&gt;Short-term fix was &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-3832&quot; title=&quot;aggregation:  early $sort should be optimized to use an index if possible&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-3832&quot;&gt;&lt;del&gt;SERVER-3832&lt;/del&gt;&lt;/a&gt;; other improvements listed are further off in the future.&lt;/p&gt;</comment>
                            <comment id="76628" author="cwestin" created="Thu, 29 Dec 2011 18:53:47 +0000"  >&lt;p&gt;No, you want to specify the sort, but the optimization I&apos;m referring to will convert that into an index scan, so the data won&apos;t be loaded into memory to be sorted.&lt;/p&gt;

&lt;p&gt;Given the two indexes you have, they will compete over the predicate on sId vs the sort on intId.  The predicate will want to use the composite index, while the sort will want to use the plain intId index.  At present, it seems like the optimizer will choose the composite index, which means you&apos;ll still have to sort all the data in order to do it this way.  If you have multiple $sort/$skip/$limit threads running in parallel, they will all do that.  You may have to wait for $out to collect the output of the sort after it has been done once, and then use that as the input for the next stage.&lt;/p&gt;

&lt;p&gt;For this particular case, there&apos;s a different optimization that seems like it would be a better idea.  If there were a composite index on &amp;lt;sId, aid&amp;gt;, then we could take advantage of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-4507&quot; title=&quot;aggregation:  optimize $group to take advantage of sorted sequences&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-4507&quot;&gt;SERVER-4507&lt;/a&gt; (but note that&apos;s further off in the future).&lt;/p&gt;</comment>
                            <comment id="76449" author="xiaofeng" created="Thu, 29 Dec 2011 00:19:12 +0000"  >&lt;p&gt;Thanks for your explanation. I have a composite index for sId and intId, and another index for intId. Can I just skip $sort?&lt;/p&gt;</comment>
                            <comment id="76354" author="cwestin" created="Wed, 28 Dec 2011 18:21:00 +0000"  >&lt;p&gt;This is not surprising, a sort requires all the data in order to be carried out.  However, if you have the required indexes, that step can be skipped, and the optimization described by &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-3832&quot; title=&quot;aggregation:  early $sort should be optimized to use an index if possible&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-3832&quot;&gt;&lt;del&gt;SERVER-3832&lt;/del&gt;&lt;/a&gt; (linked-to above) will come into play.  That&apos;s not very difficult, but I haven&apos;t had a chance to finish it yet.  It should be done in the next week or two, barring interruptions.&lt;/p&gt;

&lt;p&gt;Just to be sure that optimization will apply in this case, can you tell me what indexes you&apos;ve got on the surveyResponse collection?&lt;/p&gt;</comment>
                            <comment id="76325" author="xiaofeng" created="Wed, 28 Dec 2011 16:46:17 +0000"  >&lt;p&gt;I have tested with 3 threads, 4 threads.&lt;/p&gt;</comment>
                            <comment id="76252" author="eliot" created="Wed, 28 Dec 2011 05:56:30 +0000"  >&lt;p&gt;How many were you running in parallel?&lt;/p&gt;</comment>
                            <comment id="76229" author="xiaofeng" created="Wed, 28 Dec 2011 01:35:14 +0000"  >&lt;p&gt;full command for multi thread aggregation:&lt;/p&gt;

&lt;p&gt;db.runCommand( { aggregate : &quot;surveyResponse&quot;, pipeline : [&lt;br/&gt;
{ $match : { sId: &quot;4ed52f6601bf2abf47697b8c&quot;, excluded: 0, $nor: [ &lt;/p&gt;
{ status: -99 }
&lt;p&gt;, &lt;/p&gt;
{ status: -50 }
&lt;p&gt;, &lt;/p&gt;
{ status: -60 }
&lt;p&gt; ] }  },&lt;br/&gt;
{ $sort: {intId : 1}},&lt;/p&gt;
{ $skip: 2000}
&lt;p&gt;,&lt;/p&gt;
{ $limit:1000}
&lt;p&gt;,&lt;br/&gt;
{ $project: {&lt;br/&gt;
responseId : &quot;$_id&quot;,&lt;br/&gt;
intId:1, &lt;br/&gt;
status:1,&lt;br/&gt;
&quot;questions&quot;:1}},&lt;/p&gt;
{ $unwind : &quot;$questions&quot;}
&lt;p&gt;,&lt;/p&gt;
{ $unwind : &quot;$questions.answers&quot;}
&lt;p&gt;,&lt;br/&gt;
{ $group : { _id:&quot;$aid&quot;,&lt;br/&gt;
count : {$sum : 1},&lt;br/&gt;
}}&lt;br/&gt;
]});&lt;/p&gt;

&lt;p&gt;full command without using multi thread:&lt;/p&gt;

&lt;p&gt;db.runCommand( { aggregate : &quot;surveyResponse&quot;, pipeline : [&lt;br/&gt;
{ $match : { sId: &quot;4ed52f6601bf2abf47697b8c&quot;, excluded: 0, $nor: [ &lt;/p&gt;
{ status: -99 }
&lt;p&gt;, &lt;/p&gt;
{ status: -50 }
&lt;p&gt;, &lt;/p&gt;
{ status: -60 }
&lt;p&gt; ] }  },&lt;br/&gt;
{ $project: {&lt;br/&gt;
responseId : &quot;$_id&quot;,&lt;br/&gt;
intId:1, &lt;br/&gt;
status:1,&lt;br/&gt;
&quot;questions&quot;:1}},&lt;/p&gt;
{ $unwind : &quot;$questions&quot;}
&lt;p&gt;,&lt;/p&gt;
{ $unwind : &quot;$questions.answers&quot;}
&lt;p&gt;,&lt;br/&gt;
{ $group : { _id:&quot;$aid&quot;,&lt;br/&gt;
count : {$sum : 1},&lt;br/&gt;
}}&lt;br/&gt;
]});&lt;/p&gt;</comment>
                            <comment id="76224" author="eliot" created="Wed, 28 Dec 2011 00:54:19 +0000"  >&lt;p&gt;Can you send the full command?&lt;br/&gt;
What you have before doesn&apos;t need map/reduce or pipeline.&lt;/p&gt;</comment>
                            <comment id="76216" author="xiaofeng" created="Tue, 27 Dec 2011 23:51:03 +0000"  >&lt;p&gt;Here&apos;re the commands we generated in different threads:&lt;br/&gt;
db.runCommand( { aggregate : &quot;myCollection&quot;, pipeline : [&lt;br/&gt;
{ $match : { sId: &quot;4ed52f6601bf2abf47697b8c&quot;, excluded: 0, $nor: [ &lt;/p&gt;
{ status: -99 }
&lt;p&gt;, &lt;/p&gt;
{ status: -50 }
&lt;p&gt;, &lt;/p&gt;
{ status: -60 }
&lt;p&gt; ] }  },&lt;br/&gt;
{ $sort: {intId : 1}},&lt;/p&gt;
{ $limit:1000}
&lt;p&gt;,&lt;br/&gt;
{ $project: {...}},&lt;br/&gt;
..]});&lt;/p&gt;

&lt;p&gt;db.runCommand( { aggregate : &quot;myCollection&quot;, pipeline : [&lt;br/&gt;
{ $match : { sId: &quot;4ed52f6601bf2abf47697b8c&quot;, excluded: 0, $nor: [ &lt;/p&gt;
{ status: -99 }
&lt;p&gt;, &lt;/p&gt;
{ status: -50 }
&lt;p&gt;, &lt;/p&gt;
{ status: -60 }
&lt;p&gt; ] }  },&lt;br/&gt;
{ $sort: {intId : 1}},&lt;/p&gt;
{ $skip: 1000}
&lt;p&gt;,&lt;/p&gt;
{ $limit:1000}
&lt;p&gt;,&lt;br/&gt;
{ $project: {...}},&lt;br/&gt;
..]});&lt;/p&gt;

&lt;p&gt;db.runCommand( { aggregate : &quot;myCollection&quot;, pipeline : [&lt;br/&gt;
{ $match : { sId: &quot;4ed52f6601bf2abf47697b8c&quot;, excluded: 0, $nor: [ &lt;/p&gt;
{ status: -99 }
&lt;p&gt;, &lt;/p&gt;
{ status: -50 }
&lt;p&gt;, &lt;/p&gt;
{ status: -60 }
&lt;p&gt; ] }  },&lt;br/&gt;
{ $sort: {intId : 1}},&lt;/p&gt;
{ $skip: 2000}
&lt;p&gt;,&lt;/p&gt;
{ $limit:1000}
&lt;p&gt;,&lt;br/&gt;
{ $project: {...}},&lt;br/&gt;
..]});&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="22191">SERVER-3832</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is depended on by">
                                        <issuelink>
            <issuekey id="26715">SERVER-4507</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 28 Dec 2011 00:54:19 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        12 years, 3 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-3832'>SERVER-3832</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            12 years, 3 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[OS X]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>cwestin</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>xiaofeng</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hroi9b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hritfr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>23422</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrj9fr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>