[SERVER-14985] Merge stages in aggregation should be distributed beyond primary shard Created: 21/Aug/14  Updated: 12/Jun/15  Resolved: 08/Apr/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Henrik Ingo (Inactive) Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-17737 Support distributed merger for aggreg... Open
Related
related to DOCS-3064 update 2.6 aggregation and sharded co... Closed
is related to SERVER-18925 Merging part of aggregation pipeline ... Closed
Tested
Operating System: ALL
Participants:

 Description   

http://docs.mongodb.org/manual/core/aggregation-pipeline-sharded-collections/

"The second pipeline consists of the remaining pipeline stages and runs on the primary shard. The primary shard merges the cursors from the other shards and runs the second pipeline on these results. The primary shard forwards the final results to the mongos. In previous versions, the second pipeline would run on the mongos."

  • This prevents scaling out non-trivial aggregation pipeline queries
  • As a specific case, consider that when using $redact, then all of your queries become aggregation queries, so 100% of your reads will have to flow through the primary shard, even if they otherwise would be a targeted query.
  • Minor, but related: Note that selection of the primary shard is usually implicit, ie random from the user point of view and cannot be changed afterwards.


 Comments   
Comment by Daniel Pasette (Inactive) [ 08/Apr/15 ]

closing as duplicate of SERVER-17737

Comment by Mathias Stearn [ 21/Aug/14 ]

The reasons that the merge stages in agg were moved from the mongos to the primary shard are largely due to issues in mongos unrelated to agg that conflicted with our plans for 2.6. It was decided that fixing these issues was out-of-scope for 2.6.

  • Mongos isn't allowed to write to disk (except when opting in to log files) so we could not use the external sorter ("allowDiskUse") there.
  • Mongos doesn't really have any of its own cursors, it just forwards its requests to the shards. In particular cursors from a single shard are assumed to be from the primary shard for the database. It also has no mechanism to kill cursors, so if a client goes away before issuing a killCursor or exhausting the cursor, it will be leaked. Since agg cursors can be much heavier than other cursors, this was deemed unacceptable.
  • $out needs to write to the primary shard, so the data might as well go there. This is because sharded ouput was out-of-scope for 2.6, and all unsharded collections must live on the primary shard (SERVER-939). Note that map/reduce does have a sharded output, but there are a number of issues there (e.g., SERVER-12261, SERVER-7926, SERVER-14324), and we decided not to emulate the map/reduce approach to sharded output.

Additionally, agg was the only case where we did "heavy lifting" inside of mongos, both in terms of CPU and memory usage (m/r does all real work on the shards like 2.6 agg). This conflicted with the design principle that mongos should be very lightweight and can be run on users app servers without contending too heavily for resources.

Generated at Thu Feb 08 03:36:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.