[SERVER-25899] Make the distinct command always execute as an aggregation pipeline Created: 31/Aug/16  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-27915 Make $group with $addToSet accumulato... Backlog
Related
related to SERVER-3141 distinct needs to have a way to outpu... Closed
is related to SERVER-25184 Add a $distinct aggregation stage Backlog
is related to SERVER-25898 Make the count command always execute... Backlog
Assigned Teams:
Query Optimization
Participants:

 Description   

We execute distinct over views as aggregations, but there is no reason that all distinct commands couldn't internally use the agg path.

As part of this work, we need to allow the aggregation system to make use of the DISTINCT_SCAN fast path for aggregations that are logically distinct operations.



 Comments   
Comment by David Golden [ 11/Nov/16 ]

Please consider addressing SERVER-3141 as part of any overhaul. I.e. a "distinct_unwrap" stage that outputs any number of documents rather than just a single array which could overflow the BSON document limit. If the distinct command needs to wrap that back up into an array, fine, but if we really want to support humongous data, we shouldn't continue to impose an arbitrary limit on the size of a distinct query.

Comment by Andy Schwerin [ 31/Aug/16 ]

If you do this, please do it to mongos as well.

Generated at Thu Feb 08 04:10:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.