[SERVER-2130] Ability to use Limit() with Distinct() Created: 22/Nov/10  Updated: 06/Apr/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Ed Rooth Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 53
Labels: query_triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

all


Issue Links:
Related
related to SERVER-25184 Add a $distinct aggregation stage Backlog
related to SERVER-4507 aggregation: optimize $group to take... Backlog
related to SERVER-27915 Make $group with $addToSet accumulato... Backlog
related to SERVER-9507 Optimize $sort+$group+$first pipeline... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

Ability to use Limit() with Distinct()... or any of the cursor methods for the matter (sort, count, skip, etc. although limit has the biggest impact).

Apparently there is no simple way to currently perform the following SQL query:

SELECT DISTINCT(myField) FROM myTable WHERE x = y LIMIT(20)

The current Distinct() implementation only allows for bringing back ALL distinct values in the collection or matching a query, but there is no way to limit these results. This would be very convenient and there are many use cases.



 Comments   
Comment by Devin Matte [ 05/Nov/20 ]

Is there any progress on this feature? Aggregate is an okay workaround however it's not ideal, hoping to be able to apply limit with distinct

Comment by Asya Kamsky [ 04/Jul/19 ]

4.2 has the fix SERVER-9507 - while it is not a direct solution to this problem, it will use an  efficient DISTINCT_SCAN of the index while doing $group: on an indexed field.

The real solution for $limit after $group being optimized will be SERVER-4507.

 

 

Comment by Asya Kamsky [ 13/Jun/18 ]

micseydel we still have plans to provide simpler syntax (and more performant execution) for what is requested here.

The two linked SERVER tickets describe the work that we hope to schedule in the future.

Comment by Michael Robert Seydel [ 12/Jun/18 ]

Is this still a possible feature? Given how long it's been open, and a "workaround" being provided, I would imagine this should be closed a Won't Complete.

Comment by Alexandru Baetu [ 17/Jul/17 ]

If you have 7 mil documents and you perform the collection.aggregate([{$group : {_id : "$field"}},{$limit:1}]) you will wait ~20 seconds. This is not ok. If i need a set of "field" values having the cardinal 1, i shouldn't wait that long.

Comment by Asya Kamsky [ 22/Feb/16 ]

axom you absolutely correct, I will fix or remove mine.

Comment by Brian Riley [ 22/Feb/16 ]

Asya's suggestion may have worked in a past iteration of mongo, but at least currently you will need to adjust the group stage to get the limit to do anything.

{$group: {_id: "$myField"}}

Comment by Asya Kamsky [ 09/Dec/14 ]

This can be done using aggregation framework as

edited

db.collection.aggregate([{$match:{<your-filter>}}, {$group:{_id:"$myField"}}},
/* optionally */ {$sort:{_id:1}} /* or by another field that was preserved in $group stage */
{$limit:20}])

Comment by mozillazg [ 09/Dec/14 ]

Is there any news about this feature request?

Comment by Ivan Fioravanti [ 11/Nov/14 ]

Any news on this feature request? It is quite useful in some scenario.

Comment by Scott Lowe [ 19/Feb/14 ]

Very surprised I couldn't do this. Last comment Sept 2012! Any update?

Comment by Ian Whalen (Inactive) [ 17/Sep/12 ]

jhgustafsson this has not been completed yet, and it isn't scheduled for the upcoming 2.4 release. We will re-evaluate this for upcoming releases as we begin preparing them.

Comment by Johan Gustafsson [ 16/Sep/12 ]

Is this feature resolved yet? If not, is it planned for any version? (and is the PHP-driver likely to be updated soon after?)

Comment by Dominik Gehl [ 03/Oct/11 ]

Sort() with Distinct() could also be very useful, for example in the following case:

Email messages and attachments (attachments saved in an array inside the email message document).
Getting the distinct list of attachment names ordered by message date would ask for the distinct on 'attachment name', but with a sort on 'message date'.

Comment by Gerhard Balthasar [ 14/Apr/11 ]

Using distinct with sort would be my biggest hope for 1.9.0 apart from limit.

Generated at Thu Feb 08 02:59:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.