[SERVER-22760] Sharded aggregation pipelines which involve taking a simple union should merge on mongos Created: 19/Feb/16  Updated: 20/May/21  Resolved: 08/Aug/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Sharding
Affects Version/s: None
Fix Version/s: 3.5.12

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Bernard Gorman
Resolution: Done Votes: 0
Labels: eng-l, optimization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-27937 pull apart the AsyncResultsMerger log... Closed
is depended on by SERVER-29141 Add sharding support for merging chan... Closed
Related
related to SERVER-27637 Merging phase of a distributed $sampl... Closed
related to SERVER-30412 mongos Segmentation fault during aggr... Closed
related to SERVER-32282 Aggregation text search returns text ... Closed
related to SERVER-32297 Aggregations that merge on mongos do ... Closed
related to SERVER-23955 Return what host the Shard::runComman... Backlog
related to SERVER-30480 Update aggregation explain format to ... Closed
related to SERVER-30871 Permit blocking aggregation stages to... Closed
is related to SERVER-18940 Optimise sharded aggregations that ar... Closed
is related to SERVER-142 Read-only views over collection data. Closed
is related to DOCS-11083 Create a table of where sorts happen ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 13 (04/22/16), Query 2017-07-10, Query 2017-07-31, Query 2017-08-21
Participants:

 Description   

Suppose you have a pipeline consisting of a single $match stage which is delivered to mongos by the client. Also suppose that mongos will target multiple shards in order to find the matching data. Currently, the aggregation system will establish a shard as the merging node. Instead, mongos could perform the merging (it already knows how to do this for .find() operations).

This would improve the latency of the aggregation operation, since it would eliminate a network hop. Data currently has to travel from a targeted shard, to the merging shard, to mongos, and back to the client. If mongos were responsible for merging, data would only have to travel from a targeted shard, to mongos, to the client.

Mongos can also merge sorted streams coming from the shards for .find() operations, so this could also be extended to $match+$sort pipelines.



 Comments   
Comment by Githook User [ 08/Aug/17 ]

Author:

{'username': 'gormanb', 'email': 'bernard.gorman@gmail.com', 'name': 'Bernard Gorman'}

Message: SERVER-22760 Sharded aggregation pipelines which involve taking a simple union should merge on mongos
Branch: master
https://github.com/mongodb/mongo/commit/9062f71621d053e32beec1061f708e3fd08b0158

Comment by Asya Kamsky [ 11/Mar/16 ]

It can cause the same query to have very different execution based on how chunks are distributed.

That's okay, IMHO - I think biggest win would be if find ( expression ) and aggregate match same-expression have the same performance in sharded cluster.

Comment by Andy Schwerin [ 09/Mar/16 ]

I agree with Charlie's preferred path. I don't think we should forgo optimizing agg latencies just because sometimes we cannot optimize them. The big reason in my mind when we moved the later pipeline stages to a merge mongod was to enable spilling to disk, but for queries that do not need it, and we didn't want to spend the time back then to support running the end of the pipe in either mongod or mongos. I think as Mathias says, this was in large part because it was hard to make cursors on mongos at the time. If we build a more intelligent agg planner, that detects when mongos can safely act as the merger, we'll maximize the utility of the system in the long run.

Comment by Mathias Stearn [ 09/Mar/16 ]

charlie.swanson, the issue with merging on mongos wasn't exactly that it used too much memory, it's that we wanted to support features that mongos couldn't either at the time (returning cursors), or potentially ever ($out, allowDiskUsage). Since we wanted to use cursors for all aggregations in new drivers, it didn't make sense to have mongos have a more optimal path if it didn't support cursors. Mongos' cursor implementation has been rewritten in 3.2 which makes it easy to support agg cursors now, so that argument disappears. There was some concern previously that agg on mongos used more resources than other operations that a mongos performs, but it seems like people are more concerned about lowering the latency of operations than getting work out of mongos. However, we should discuss with the Product and Consulting teams before making another shift here.

There is an obvious trivial case of "Simple merge on mongos" that could be implemented first even if we decide not to implement the full change right away. If after splitting, the merge pipeline is empty (meaning all agg stages are performed on shards), we can just use a simple forwarding/unioning cursor in mongos. This will be helpful for users who use agg by default even for operations that could be done with find. This lays a simple groundwork that can be added to incrementally (eg just support $limit in merger) until the full "Simple merge on mongos" is implemented.

Also, there is a subtle downside to "Smarter non-exact match targeting". It can cause the same query to have very different execution based on how chunks are distributed. This could be a problem for users if a "hot" query suddenly transitions to using a merger following a migration. This is somewhat mitigated by the fact that the worst case in that world is exactly what we do now, but it still reduces the predictability of performance. Currently all execution choices in sharded agg are based solely on the "shape" of the pipeline, and not on the actual values or distribution of chunks across shards.

cc asya richard.kreuter

Comment by Charlie Swanson [ 09/Mar/16 ]

I just had a conversation with david.storch about some possible improvements with aggregation in a sharded cluster. There's this one, the linked SERVER-18940, and one other (not tracked in JIRA). Here's a summary of the different possibilities:

  Existing Behavior Simple merge on mongos (this ticket: SERVER-22760) Merging + running agg stages on mongos (no existing ticket) Smarter non-exact match targeting (SERVER-18940)
Description
Any aggregation that is not an exact match on the shard key will be split into two pieces: the part that can be run in parallel across all shards, and the rest of the pipeline that needs to use results merged from different shards. Some shard will be designated as the merging shard, and will coordinate merging results from the first part, and running the rest of the pipeline.

If the merging part of the aggregation is simple - combining sorted streams, taking a limit, or simply forwarding results in any order - we can use mongos' existing cursor merging logic to do it. In these cases, we can avoid passing results through a 'merging shard', and simply do the merge in the mongos.

For pipelines that need to be split in a sharded cluster, sometimes the half of the pipeline that needs to run post-merge is relatively simple. If there are no 'expensive' stages which use a lot of memory ($group, $sort, maybe others), or stages that need to be run on the primary shard ($lookup, $out), then we can run the second half of the pipeline on the mongos

If the 'initial query' of the pipeline (usually just the first $match) targets documents that all live on the same shard, we would be able to run the entire pipeline on that shard, except in cases of $out, etc.
My best guess of how it would be implemented
N/A - already exists

Medium-Ish: Do the split. If the second half of the split only involves things that can be handled by mongos's logic for merging cursors, then let the mongos do the merging. Note we might have to add some logic to make sure sort keys are preserved when they get to the mongos, potentially inspecting a $project stage to see if it's a 'simple projection' - without any renames or computed fields.

Easy-ish? Maybe medium: Do the normal pipeline splitting. If any stage after the split might use a lot of memory (non-merging $sort, $group) or needs a mongod ($lookup, $out), fall back to existing agg scatter/merge. Otherwise, just have the mongos perform the 'merging' part of the pipeline. Note that all merging used to happen in the mongos before 2.6.

Difficulty is unclear: See comments on SERVER-18940
Example pipeline: [$match, $sort], match is a range query targeting a single shard
1. mongos issue $match, $sort to targeted shard
2. targeted shard responds with cursor id
3. mongos forwards cursor id to chosen merger shard
4. merger shard issues getMore on cursor id
5. shard responds with results
6. merging shard 'merges' results (only one thing to merge), responds with results to mongos

1. $match, $sort is sent to targeted shard
2. targeted shard respond with results, mongos 'merges' results using ClusterClientCursor (the mechanism used to merge results from find commands).

1. $match, $sort is sent to targeted shard
2. targeted shard respond with results, mongos 'merges' results using DocumentSourceMergeCursors (an internal aggregation stage).

1. The entire pipeline is forwarded to the target shard
2. The mongos forwards results from the aggregation to the client
Example pipeline: [$match, $sort], match is a range query targeting multiple shards
1. mongos issue $match, $sort to targeted shards
2. targeted shards respond with cursor ids
3. mongos forwards cursor ids to chosen merger shard
4. merger shard issues getMore's on cursor ids
5. shards respond with results
6. merging shard merges results into single stream (preserving sort order), responds with results to mongos

1. pipeline is rewritten as a find command, mongos forwards command to targeted shards
2. targeted shards respond with results, mongos merges results using ClusterClientCursor (the mechanism used to merge results from find commands)

1. $match, $sort is sent to targeted shards
2. targeted shards respond with results, mongos merges results using DocumentSourceMergeCursors (an internal aggregation stage).

N/A - Same as existing behavior
Example: [$match, $sort, $group], match is a range query targeted to a single shard
1. mongos issue $match, $sort to targeted shard
2. targeted shard respond with cursor ids
3. mongos forwards cursor id to chosen merger shard
4. merger shard issues getMore on cursor id
5. shard responds with results
6. merging shard 'merges' results (only one thing to merge), performs the $group, then responds with results to mongos

N/A - there is non-trivial work to do after the merge.

N/A - $group is a blocking (potentially memory-intensive) stage

1. The entire pipeline is forwarded to the target shard
2. The mongos forwards results from the aggregation to the client
Example pipeline: [$match, $sort, $project], first match is a range query targeting multiple shards
1. mongos issue $match, $sort to targeted shards
2. targeted shards respond with cursor ids
3. mongos forwards cursor ids to chosen merger shard
4. merger shard issues getMore's on cursor ids
5. shards respond with results
6. merging shard merges results into single stream (preserving sort order), computes $project stage, responds with results to mongos

N/A - there is non-trivial work that will happen after the merging. Maybe the $project is a simple projection and could be moved in front of the sort, but that would be hard-ish to figure out and wouldn't always apply.

1. $match, $sort is sent to targeted shards
2. targeted shards respond with results, mongos merges results using DocumentSourceMergeCursors (an internal aggregation stage), performs $project.

N/A - Same as existing behavior

I think the ideal scenario would be if we implemented both "Merging and running agg stages on mongos" and "Smarter non-exact match targeting".

It's my understanding that we moved away from doing merging in the mongos because stages like $group and $sort were taking up too much memory. Were there any other downsides to doing the merging on the mongos?
cc asya, richard.kreuter, redbeard0531

Generated at Thu Feb 08 04:01:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.