<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:07:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-24815] Merging aggregation pipeline strategy should be configurable</title>
                <link>https://jira.mongodb.org/browse/SERVER-24815</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18925&quot; title=&quot;Merging part of aggregation pipeline should be performed on a random shard to distribute the load&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18925&quot;&gt;&lt;del&gt;SERVER-18925&lt;/del&gt;&lt;/a&gt; added the distribution of merge stage of aggregation to any random shard rather than overloading primary shard for the database.&lt;/p&gt;

&lt;p&gt;On a very large cluster this can max out the number of connections when every shard can be sending data to every shard and therefore it&apos;s desirable to add ability to stick to primary shard or be able to limit the shards that can be delegated with this ability.&lt;/p&gt;</description>
                <environment></environment>
        <key id="296949">SERVER-24815</key>
            <summary>Merging aggregation pipeline strategy should be configurable</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="asya.kamsky@mongodb.com">Asya Kamsky</reporter>
                        <labels>
                            <label>performance</label>
                    </labels>
                <created>Mon, 27 Jun 2016 16:10:01 +0000</created>
                <updated>Tue, 6 Dec 2022 04:22:44 +0000</updated>
                                            <version>3.2.0</version>
                                                    <component>Aggregation Framework</component>
                    <component>Sharding</component>
                                        <votes>1</votes>
                                    <watches>22</watches>
                                                                                                                <comments>
                            <comment id="1451556" author="david.storch" created="Wed, 7 Dec 2016 18:39:19 +0000"  >&lt;p&gt;Hi all, I&apos;d like to provide an update on the status of this ticket. The long-term change tracked by this ticket would be to build a complete mechanism by which operators can specify rules for load-balancing aggregation work in a sharded cluster. How exactly to design and build such a feature requires more work, and is not scheduled to be fixed in any of the currently supported stable branches. In the short-term, we have provided a simple on/off configuration switch which can be used to control whether merging work is done on a random shard or is always done on the primary shard: see &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-27304&quot; title=&quot;Create setParameter to make all aggregations which require merging select the primary shard as the merger&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-27304&quot;&gt;&lt;del&gt;SERVER-27304&lt;/del&gt;&lt;/a&gt;. We are currently evaluating whether this fix can safely be included in a future minor release of 3.2.&lt;/p&gt;

&lt;p&gt;There are a few other open tickets tracking ideas for related improvements that may be of interest:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-27283&quot; title=&quot;Sharded aggregations that need merging should only consider for merging the shards that have documents to contribute&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-27283&quot;&gt;SERVER-27283&lt;/a&gt;, as described by my colleague &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=asya&quot; class=&quot;user-hover&quot; rel=&quot;asya&quot;&gt;asya&lt;/a&gt; above.&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-22760&quot; title=&quot;Sharded aggregation pipelines which involve taking a simple union should merge on mongos&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-22760&quot;&gt;&lt;del&gt;SERVER-22760&lt;/del&gt;&lt;/a&gt;. This work will prevent any shard from acting as the merger altogether for some aggregation pipelines.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="1449576" author="asya" created="Mon, 5 Dec 2016 19:45:21 +0000"  >&lt;p&gt;I filed &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-27283&quot; title=&quot;Sharded aggregations that need merging should only consider for merging the shards that have documents to contribute&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-27283&quot;&gt;SERVER-27283&lt;/a&gt; to separate tracking work for the system to automatically consider only shards which contributed data to aggregation for merging (currently only shards &lt;b&gt;targeted&lt;/b&gt; by aggregation are considered, new ticket will track not considering shards which were targeted but didn&apos;t have any results to contribute).&lt;/p&gt;</comment>
                            <comment id="1447875" author="stuart.hall@masternaut.com" created="Fri, 2 Dec 2016 16:54:57 +0000"  >&lt;p&gt;Thanks Charlie. I&apos;ve performed my own testing here and I can confirm that this behaviour does indeed happen. Thanks for pointing this out.&lt;/p&gt;

&lt;p&gt;The one area we still have concern here is when the query touches several shards, or is non-targetted (i.e. not using the shard key) - in this case, it may still result in undesirable behaviour in that the aggregation merge stage will run on an unspecified host which may not be resourced adequately to handle. My view here is that some form of additional steering would be desirable as it would allow the administrator to take ultimate control of where the merge role runs in the case of unevenly specified (or geograhically distributed) clusters.&lt;/p&gt;</comment>
                            <comment id="1438108" author="charlie.swanson" created="Fri, 18 Nov 2016 23:11:21 +0000"  >&lt;p&gt;I just looked back at the code which controls this, it looks to me like it does pick randomly from the shards &lt;b&gt;involved in the aggregation&lt;/b&gt;. Is there some evidence of someone getting a shard doing the merge that didn&apos;t participate?&lt;/p&gt;

&lt;p&gt;If there is nothing requiring that the primary shard runs the second half (like an $out or a $lookup), the mongos will &lt;a href=&quot;https://github.com/mongodb/mongo/blob/r3.2.11-rc1/src/mongo/s/commands/cluster_pipeline_cmd.cpp#L242&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;select randomly from &lt;tt&gt;shardResults&lt;/tt&gt;&lt;/a&gt;, which is &lt;a href=&quot;https://github.com/mongodb/mongo/blob/r3.2.11-rc1/src/mongo/s/commands/cluster_pipeline_cmd.cpp#L178&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;populated with the shards which participated in the aggregation&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="1437423" author="stuart.hall@masternaut.com" created="Fri, 18 Nov 2016 10:47:33 +0000"  >&lt;p&gt;Following discussions this week, I thought I would document a couple of other specific cases where this issue would produce extremely non-desirable results:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;In a Zone Sharded cluster, it may result in the merge operation being executed on a shard that is geographically distant from the data shards, resulting in unnecessary cross-data-centre traffic and a consequent reduction in performance&lt;/li&gt;
	&lt;li&gt;In a cluster managed with tag ranges, all shards may not be of equal performance. e.g. archive shards may be relatively underpowered, reflecting the low query workload that they receive. In this case, it may end up with performance-critical aggregations merging on these shards and with &quot;allowDiskUse&quot; queries, this would result in an inconsistent, substantial degradation in query performance.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;For both of these cases, the solution is similar - some sort of steering is required in where the merge runs. Possible automatic for case #1 (e.g. running close to the correct zone) or manual for case #2 (tag certain shards that are suitable for running merges)&lt;/p&gt;

&lt;p&gt;In our environment, we are running configuration #2 and we&apos;re currently reviewing whether this is a blocker for a 3.2.x upgrade.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="336776">SERVER-27304</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="336507">SERVER-27283</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="210028">SERVER-18925</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[500A000000Vn0wiIAB]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 27 Jun 2016 18:05:34 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        7 years, 10 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            7 years, 10 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>asya.kamsky@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>stuart.hall@masternaut.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrk41z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2enj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrj6vb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>