<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:24:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-30633] Large performance regression for large aggregation queries</title>
                <link>https://jira.mongodb.org/browse/SERVER-30633</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;MongoDB 3.3.9 onwards added a very significant performance regression running large aggregation queries.  Query times now seem to grow quadratically with query size.  My bug was found using the aggregation framework but it may be a more general bug.  I&apos;ve only tested up to version 3.5.11.&lt;/p&gt;

&lt;p&gt;We recently explored upgrading our current mongo at 3.0.11 to 3.4.x.  However we found the execution time of one of queries which averaged a respectable ~660ms had jumped 200x to a staggering ~40 000ms.  I benchmarked against a few versions to narrow it down.  All queries were matched to a single document.&lt;/p&gt;

&lt;p&gt;3.0.15		633ms&lt;br/&gt;
3.2.16		730ms&lt;br/&gt;
3.3.6		687ms&lt;br/&gt;
3.3.8		661ms&lt;br/&gt;
3.3.9		132 089ms&lt;br/&gt;
3.3.10		134 875ms&lt;br/&gt;
3.3.11		38 787ms&lt;br/&gt;
3.4.0		38 117ms&lt;br/&gt;
3.4.4		38 348ms&lt;br/&gt;
3.4.7		38 097ms		&lt;br/&gt;
3.5.11		41 480ms&lt;/p&gt;

&lt;p&gt;It appears something changed in 3.3.9 which was then partially improved in 3.3.11. None of the JIRA issues over those versions stood out for me so I assume it&#8217;s a side affect or something else.&lt;/p&gt;

&lt;p&gt;Our aggregation query does happen to be very long at ~300K lines and is auto generated.  The query is essentially performing a long list of $max operations in the $group stage to compute the union of many hyper-log-log counters.  A similarly long $project stage just renames the fields. The schema has ~75 counters with ~1K buckets per counter, each bucket having its own field.  I&#8217;ve attached a sample generated query and sample collection with a single corresponding document as well as a screenshot of the schema to help visualise the schema.&lt;/p&gt;

&lt;p&gt;I ran some benchmarks on 3.4.4 varying both the contents of the document as well as the number of counters queried in a single aggregation.  The former had no affect.  For the latter I  took some sample times.  Note I used a different environment compared to the query used for comparing the versions above so the times won&#8217;t match:&lt;/p&gt;

&lt;p&gt;counters		ms		ms/counter&lt;br/&gt;
1			100			100&lt;br/&gt;
2			182			91&lt;br/&gt;
3			272			91&lt;br/&gt;
4			367			92&lt;br/&gt;
5			495			99&lt;br/&gt;
6			637			106&lt;br/&gt;
7			840			120&lt;br/&gt;
8			965			121&lt;br/&gt;
9			1231			137&lt;br/&gt;
10			1400		140&lt;br/&gt;
11			1700			155&lt;br/&gt;
17			3933		231&lt;br/&gt;
25			8581		343&lt;br/&gt;
40			18605		465&lt;br/&gt;
50			34863		697&lt;br/&gt;
60			47746		796&lt;br/&gt;
75			67665		902&lt;br/&gt;
90			89121		990&lt;br/&gt;
100			145390		1454&lt;/p&gt;

&lt;p&gt;Initially query time looks linear but seems to grow quadratically as n becomes larger. I can only assume there&#8217;s a linear scan in an inner loop somewhere &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;We currently rely on being able to query all counters within a few seconds at most and this is blocking us from upgrading beyond 3.2.  One temporary workaround would be issuing multiple smaller queries for those counters. However even when doing that the query performance looks too poor for users.  Perhaps there&#8217;s a better schema / approach we could take for this type of problem of storing many HLL over time and computing their unions at query-time you could suggest?  I know I&#8217;d love a feature to perform parallel operations on arrays in the $group stage? Eg given two arrays &lt;span class=&quot;error&quot;&gt;&amp;#91;1,4,2&amp;#93;&lt;/span&gt; and &lt;span class=&quot;error&quot;&gt;&amp;#91;2,3,1&amp;#93;&lt;/span&gt; compute &lt;span class=&quot;error&quot;&gt;&amp;#91;2,4,2&amp;#93;&lt;/span&gt;?&lt;/p&gt;

&lt;p&gt;The ideal outcome would be for performance to be improved to pre 3.3.9 levels.&lt;/p&gt;

&lt;p&gt;If you need more information please let me know. Also thanks for regularly publishing docker images, it made comparing versions a breeze.&lt;/p&gt;</description>
                <environment>Just a mid-2014 MacBook Pro&lt;br/&gt;
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz&lt;br/&gt;
16GB Ram&lt;br/&gt;
APPLE SSD SM0256F&lt;br/&gt;
OS X 10.12.6&lt;br/&gt;
</environment>
        <key id="416075">SERVER-30633</key>
            <summary>Large performance regression for large aggregation queries</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="mark.agarunov">Mark Agarunov</assignee>
                                    <reporter username="danielgrigg">Daniel Grigg</reporter>
                        <labels>
                            <label>Bug</label>
                    </labels>
                <created>Mon, 14 Aug 2017 04:49:27 +0000</created>
                <updated>Mon, 9 Oct 2017 17:01:53 +0000</updated>
                            <resolved>Thu, 31 Aug 2017 16:33:13 +0000</resolved>
                                    <version>3.3.9</version>
                    <version>3.4.4</version>
                    <version>3.5.11</version>
                                                    <component>Aggregation Framework</component>
                                        <votes>0</votes>
                                    <watches>17</watches>
                                                                                                                <comments>
                            <comment id="1661904" author="mark.agarunov" created="Thu, 31 Aug 2017 16:32:42 +0000"  >&lt;p&gt;Hello &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=danielgrigg&quot; class=&quot;user-hover&quot; rel=&quot;danielgrigg&quot;&gt;danielgrigg&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for the additional information. As this behavior looks to be due to the same issue as &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-30449&quot; title=&quot;ProjectionSpecValidator is O(N**2) in number of fields in the projection&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-30449&quot;&gt;&lt;del&gt;SERVER-30449&lt;/del&gt;&lt;/a&gt;, I&apos;ve closed this ticket as a duplicate. Please follow &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-30449&quot; title=&quot;ProjectionSpecValidator is O(N**2) in number of fields in the projection&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-30449&quot;&gt;&lt;del&gt;SERVER-30449&lt;/del&gt;&lt;/a&gt; for updates on this issue.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Mark&lt;/p&gt;</comment>
                            <comment id="1658095" author="danielgrigg" created="Sun, 27 Aug 2017 23:46:21 +0000"  >&lt;p&gt;Hi Charlie,&lt;/p&gt;

&lt;p&gt;I re-ran the attached query with the projection stage removed and your hypothesis is indeed correct!  Without the $project the query completed in 370ms.  Running the original again was still 48s.&lt;/p&gt;</comment>
                            <comment id="1657650" author="charlie.swanson" created="Fri, 25 Aug 2017 21:09:54 +0000"  >&lt;p&gt;Hi all,&lt;/p&gt;

&lt;p&gt;I have a suspicion that this was caused by my work in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18966&quot; title=&quot;Allow exclusion in $project stage of aggregation pipeline&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18966&quot;&gt;&lt;del&gt;SERVER-18966&lt;/del&gt;&lt;/a&gt;. To support exclusions in agg&apos;s $project I had to basically re-write the whole thing. Your project stage has a lot of fields in it, and I suspect I introduced an inefficiency in parsing, or somewhere in traversing the projection during execution. If my hypothesis is correct, the aggregation should speed up significantly without the $project stage - is this the case?&lt;/p&gt;

&lt;p&gt;In fact, it might be the case that you are experiencing &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-30449&quot; title=&quot;ProjectionSpecValidator is O(N**2) in number of fields in the projection&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-30449&quot;&gt;&lt;del&gt;SERVER-30449&lt;/del&gt;&lt;/a&gt;, which describes an inefficiency in the way we check for conflicting $project specifications (like _id: 0, &quot;_id.x&quot;: 0).&lt;/p&gt;</comment>
                            <comment id="1653463" author="danielgrigg" created="Tue, 22 Aug 2017 02:09:08 +0000"  >&lt;p&gt;Hi @mark.agarunov thanks for looking into this &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="1648056" author="mark.agarunov" created="Mon, 14 Aug 2017 20:47:26 +0000"  >&lt;p&gt;Hello &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=danielgrigg&quot; class=&quot;user-hover&quot; rel=&quot;danielgrigg&quot;&gt;danielgrigg&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for the report. I&apos;ve been able to reproduce this issue using the data and query you&apos;ve provided and am currently investigating possible causes for this behavior.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Mark&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="410789">SERVER-30449</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="210447">SERVER-18966</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="163194" name="data.tgz" size="1177975" author="danielgrigg" created="Mon, 14 Aug 2017 04:42:44 +0000"/>
                            <attachment id="163716" name="explain.3.3.8" size="17149125" author="mark.agarunov" created="Mon, 21 Aug 2017 22:43:32 +0000"/>
                            <attachment id="163717" name="explain.3.3.9" size="17149760" author="mark.agarunov" created="Mon, 21 Aug 2017 22:48:38 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 14 Aug 2017 20:47:26 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        6 years, 23 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>backlog-server-pm</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            6 years, 23 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>danielgrigg</customfieldvalue>
            <customfieldvalue>mark.agarunov</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htcxa7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|ht4oen:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;I&apos;ve attached data.tgz containing a dump of a collection (with a single document) and query1.js, the query demonstrating the bug. &lt;/p&gt;

&lt;p&gt;$ docker run -d -p23380:27017 --name mongo-3.3.8 -v $PWD/data:/data  mongo:3.3.8&lt;br/&gt;
$ docker exec  mongo-3.3.8 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson&lt;br/&gt;
$ time docker exec mongo-3.3.8 mongo skyfii_reporting /data/query1.js&lt;/p&gt;

&lt;p&gt;&#8230;&lt;br/&gt;
real	0m1.806s&lt;br/&gt;
user	0m0.018s&lt;br/&gt;
sys	0m0.020s&lt;/p&gt;


&lt;p&gt;$ docker run -d -p23440:27017 --name mongo-3.4.4 -v $PWD/data:/data  mongo:3.4.4&lt;br/&gt;
$ docker exec  mongo-3.4.4 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson&lt;br/&gt;
$ time docker exec mongo-3.4.4 mongo skyfii_reporting /data/query1.js&lt;/p&gt;

&lt;p&gt;&#8230;&lt;br/&gt;
real	0m42.306s&lt;br/&gt;
user	0m0.832s&lt;br/&gt;
sys	0m0.194s&lt;/p&gt;


&lt;p&gt;$ docker run -d -p23511:27017 --name mongo-3.5.11 -v $PWD/data:/data  mongo:3.5.11&lt;br/&gt;
$ docker exec  mongo-3.5.11 mongorestore --db skyfii_reporting --collection WifiSession_Daily /data/dump/skyfii_reporting/WifiSession_Daily.bson&lt;br/&gt;
$ time docker exec mongo-3.5.11 mongo skyfii_reporting /data/query1.js&lt;br/&gt;
&#8230;&lt;br/&gt;
real	0m40.918s&lt;br/&gt;
user	0m0.018s&lt;br/&gt;
sys	0m0.020s&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htcjd3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>