[DOCS-7237] Ambiguity about Where the Merge Sort Occurs in a Sharded Cluster Created: 23/Feb/16  Updated: 14/Apr/22  Resolved: 05/Feb/18

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: William Cross Assignee: Susan Kerschbaumer (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-17737 Support distributed merger for aggreg... Open
Related
related to DOCS-13229 Conflicting info about Aggregation pi... Closed
related to DOCS-8582 Update v3.2, v3.0, and v2.6 versions ... Closed
related to DOCS-12238 Cursor sort and aggregation sort shar... Closed
is related to DOCS-11083 Create a table of where sorts happen ... Closed
Participants:
Days since reply: 1 year, 42 weeks, 6 days ago
Story Points: 0.5

 Description   

The documentation states, in different places, that a merge sort occurs on the mongos, or on the Primary database of the collection. The documentation may need to be reconciled, or, if I'm misunderstanding what's being communicated, clarified.

I believe that the mongos is the only shard that merge-sorts the results of a .find() query.

Statements in favor of each:

  • the primary shard
    • "If the query specifies sorted results using the sort() cursor method, the mongos instance passes the $orderby option to the shards. The primary shard for the database receives and performs a merge sort for all results before returning the data to the client via the mongos."
  • the mongos
    • "If you call the cursor.sort() method on a query in a sharded environment, the mongod for each shard will sort its results, and the mongos merges each shard’s results before returning them to the client."

I also note that the explanation references the $orderby option, which, according to the documentation, is deprecated as of 3.2.



 Comments   
Comment by Jess Mokrzecki [ 14/Apr/22 ]

Fix Version updated for upstream SERVER-17737:

Comment by Ravind Kumar (Inactive) [ 09/Oct/18 ]

asya is SERVER-22760 the ticket you were referencing? It seems like the logic behind who owns a merge sort for an aggregation pipeline has significant complexity.

Comment by Githook User [ 06/Feb/18 ]

Author:

{'email': 'sue.kerschbaumer@10gen.com', 'name': 'skerschb', 'username': 'skerschb'}

Message: DOCS-7237 adding changes to sharded merge in 3.6
Branch: master
https://github.com/mongodb/docs/commit/1455aa5907f91f012e77cf17f3ae3e3bd4ca1b71

Comment by David Storch [ 11/Jan/18 ]

bernard.gorman completed a bunch of the work related to this in 3.6 and could be a good reviewer for proposed documentation changes.

Comment by Asya Kamsky [ 10/Jan/18 ]

This has changed in 3.6 for aggregation and that doesn't seem to be mentioned in the docs.

Comment by David Storch [ 07/Jul/16 ]

In 3.2.x versions:

  • For find operations with a .sort(), mongos will forward the sort to all participating shards. The sorted merge will then occur on mongos.
  • For aggregation operations, merging is always done on one of the shards. The shard which performs the merge is currently chosen at random amongst all shards in the cluster. Mongos never performs any merging for aggregation operations, but rather will simply forward the results it receives from the merging shard to the client application.

In older versions (I think 2.6.x?) the mongos can perform merge operations, but only for aggregation operations. For find operations, mongos is responsible for doing the sorted merge in all versions of MongoDB. Since both of these links are documenting find (and not aggregation), the former link appears to be inaccurate whereas the latter link is correct.

Comment by Ravind Kumar (Inactive) [ 07/Jul/16 ]

david.storch, renctan, could either of you shed some light into which behavior is correct? or are both possible depending on the situation?

Generated at Thu Feb 08 07:53:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.