[SERVER-4656] aggregation: optimize sort/limit combination to only sort top-n items Created: 10/Jan/12  Updated: 07/Apr/23  Resolved: 05/Dec/12

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 2.3.2

Type: Improvement Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-7781 geoNear aggregation pipeline stage Closed
Related
is related to SERVER-447 new aggregation framework Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

If a limit follows a sort, then we can optimize the sort by only retaining the top-n items required to satisfy the request.

This may use one of http://www.cplusplus.com/reference/algorithm/partial_sort/ or
https://github.com/mongodb/mongo/blob/master/src/mongo/db/scanandorder.cpp#L40-85 , or if those don't fit, maintain a priority queue, and always discard elements after the top-n.



 Comments   
Comment by auto [ 04/Dec/12 ]

Author:

{u'date': u'2012-12-04T15:13:54Z', u'email': u'mathias@10gen.com', u'name': u'Mathias Stearn'}

Message: SERVER-4656 optimize $sort followed by $limit to do top-k selection
Branch: master
https://github.com/mongodb/mongo/commit/27d7883db067d396cbe35ac93cad109b526630d8

Comment by Eystein Stenberg [ 01/May/12 ]

This looks very promising!
In the widespread comment-array structure [

{author: "joe", "text" : "my comment", votes: 3}

, ... ] it should significantly increase the speed.

I did a test with 100 collection-objects with 2000 comments each. I want to get the 10 most voted for comments like this:

db.coll.aggregate({$project : {comments : 1}}, {$unwind : "$comments"}, {$sort : { "comments.votes" : -1}}, {$limit : 10} )

This takes 5 seconds to complete, and significantly slows down with the input size. You suggestion should make this very fast. It should reduce the input to the final sort from 200 000 to just 1000 in my example (99.5% reduction).

Thank you!

Generated at Thu Feb 08 03:06:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.