<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:37:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-15291] slow &apos;$group&apos; performance</title>
                <link>https://jira.mongodb.org/browse/SERVER-15291</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Seems that some $group queries that can be made using index-only scans, are performing unnecessary full table scans.&lt;/p&gt;

&lt;p&gt;Here are more details, As I originally described in &lt;a href=&quot;http://stackoverflow.com/questions/24980525/mongodb-slow-group-performance&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://stackoverflow.com/questions/24980525/mongodb-slow-group-performance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I have a MongoDB collection of over 1,000,000 records. Each record size is around 20K (so the total collection size is around 20GB).&lt;/p&gt;

&lt;p&gt;I have a &apos;type&apos; field in the collection (that can have around 10 different values). I would like to get the per-type counters for the collection. Also, there is an index on the &apos;type&apos; field.&lt;/p&gt;

&lt;p&gt;I&apos;ve tested two different approaches (assume python syntax):&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;A naive method - using &apos;count&apos; calls for each of the values:
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;for type_val in my_db.my_colc.distinct(&apos;type&apos;):&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    counters[type_val] = my_db.my_colc.find({&apos;type&apos; : type_val}).count()&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;&lt;/li&gt;
	&lt;li&gt;Using aggregation framework with a &apos;$group&apos; syntax:
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;counters = my_db.my_colc.aggregate([{&apos;$group&apos; :  {&apos;_id&apos;: &apos;$type&apos;, &apos;agg_val&apos;: { &apos;$sum&apos;: 1 } }}])&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The performance I&apos;m receiving for the first approach is about 2 orders of magnitude faster than for the 2nd approach. Seems to be related to the fact that count runs on the indices only, without accessing the documents, while $group has to go over the documents one-by-one. (It&apos;s about 1min vs. 45mins).&lt;/p&gt;

&lt;p&gt;Is there any way to run an efficient grouping query on the &apos;type&apos; index, that would use only the index, thus achieving the performance results from #1, but using the aggregation framework ?&lt;/p&gt;</description>
                <environment></environment>
        <key id="158725">SERVER-15291</key>
            <summary>slow &apos;$group&apos; performance</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="baruchoxman">Baruch Oxman</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Sep 2014 19:27:27 +0000</created>
                <updated>Fri, 26 Sep 2014 17:43:36 +0000</updated>
                            <resolved>Fri, 26 Sep 2014 16:38:05 +0000</resolved>
                                    <version>2.6.1</version>
                                                    <component>Performance</component>
                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="728247" author="ramon.fernandez" created="Fri, 26 Sep 2014 17:43:36 +0000"  >&lt;p&gt;Sorry, that was a typo, thanks for catching that &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=baruchoxman&quot; class=&quot;user-hover&quot; rel=&quot;baruchoxman&quot;&gt;baruchoxman&lt;/a&gt;. The right ticket is &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-11447&quot; title=&quot;aggregation can sort using index to speed up group of an indexed field&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-11447&quot;&gt;&lt;del&gt;SERVER-11447&lt;/del&gt;&lt;/a&gt;, as listed in the &quot;Issue Links&quot; section. I&apos;ve edited my previous comment as well to avoid further confusion.&lt;/p&gt;</comment>
                            <comment id="728230" author="baruchoxman" created="Fri, 26 Sep 2014 17:31:07 +0000"  >&lt;p&gt;Raul, are you use this is the correct duplicate ticket number ?&lt;br/&gt;
The title for &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-11477&quot; title=&quot;RHEL 5.7 durability - closeall.js failure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-11477&quot;&gt;&lt;del&gt;SERVER-11477&lt;/del&gt;&lt;/a&gt; is &quot;RHEL 5.7 durability - closeall.js failure&quot;...&lt;/p&gt;</comment>
                            <comment id="728176" author="ramon.fernandez" created="Fri, 26 Sep 2014 16:38:05 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=baruchoxman&quot; class=&quot;user-hover&quot; rel=&quot;baruchoxman&quot;&gt;baruchoxman&lt;/a&gt;, this ticket is a duplicate of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-11447&quot; title=&quot;aggregation can sort using index to speed up group of an indexed field&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-11447&quot;&gt;&lt;del&gt;SERVER-11447&lt;/del&gt;&lt;/a&gt;. Feel free to tune in there for updates.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="96234">SERVER-11447</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="73561">SERVER-9507</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 17 Sep 2014 20:13:55 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        9 years, 20 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            9 years, 20 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>baruchoxman</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlnrz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hs2b5z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>138179</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrllkf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>