[SERVER-24815] Merging aggregation pipeline strategy should be configurable Created: 27/Jun/16 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Sharding |
| Affects Version/s: | 3.2.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
On a very large cluster this can max out the number of connections when every shard can be sending data to every shard and therefore it's desirable to add ability to stick to primary shard or be able to limit the shards that can be delegated with this ability. |
| Comments |
| Comment by David Storch [ 07/Dec/16 ] |
|
Hi all, I'd like to provide an update on the status of this ticket. The long-term change tracked by this ticket would be to build a complete mechanism by which operators can specify rules for load-balancing aggregation work in a sharded cluster. How exactly to design and build such a feature requires more work, and is not scheduled to be fixed in any of the currently supported stable branches. In the short-term, we have provided a simple on/off configuration switch which can be used to control whether merging work is done on a random shard or is always done on the primary shard: see There are a few other open tickets tracking ideas for related improvements that may be of interest:
|
| Comment by Asya Kamsky [ 05/Dec/16 ] |
|
I filed SERVER-27283 to separate tracking work for the system to automatically consider only shards which contributed data to aggregation for merging (currently only shards targeted by aggregation are considered, new ticket will track not considering shards which were targeted but didn't have any results to contribute). |
| Comment by Stuart Hall [ 02/Dec/16 ] |
|
Thanks Charlie. I've performed my own testing here and I can confirm that this behaviour does indeed happen. Thanks for pointing this out. The one area we still have concern here is when the query touches several shards, or is non-targetted (i.e. not using the shard key) - in this case, it may still result in undesirable behaviour in that the aggregation merge stage will run on an unspecified host which may not be resourced adequately to handle. My view here is that some form of additional steering would be desirable as it would allow the administrator to take ultimate control of where the merge role runs in the case of unevenly specified (or geograhically distributed) clusters. |
| Comment by Charlie Swanson [ 18/Nov/16 ] |
|
I just looked back at the code which controls this, it looks to me like it does pick randomly from the shards involved in the aggregation. Is there some evidence of someone getting a shard doing the merge that didn't participate? If there is nothing requiring that the primary shard runs the second half (like an $out or a $lookup), the mongos will select randomly from shardResults, which is populated with the shards which participated in the aggregation. |
| Comment by Stuart Hall [ 18/Nov/16 ] |
|
Following discussions this week, I thought I would document a couple of other specific cases where this issue would produce extremely non-desirable results:
For both of these cases, the solution is similar - some sort of steering is required in where the merge runs. Possible automatic for case #1 (e.g. running close to the correct zone) or manual for case #2 (tag certain shards that are suitable for running merges) In our environment, we are running configuration #2 and we're currently reviewing whether this is a blocker for a 3.2.x upgrade. |