[SERVER-43893] $sample should desugar to $meta:"randVal" sort followed by $limit Created: 08/Oct/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

When $sample cannot execute using the storage-engine optimized path, it executes in terms of a DocumentSourceSample. The implementation of DocumentSourceSample delegates to a DocumentSourceSort stage, which is held as a data member.

The design of the query execution tree should typically compose operators by connecting them into a tree, rather than composing them by making one a data member of another. Therefore, we should improve the code by making $sample execute as a $meta:"randVal" sort operator with a coalesced limit value. The value of the limit is determined by the user query's sample size. This alternative implementation could be achieved by desugaring the user's $sample stage into a $sort followed by a $limit. This would rely on the planner to construct an execution tree which generates the "randVal" metadata then subsequently performs the $sort-$limit as a top-k sort.



 Comments   
Comment by David Storch [ 13/May/20 ]

I'm removing this ticket from the "Unify PlanStage and DocumentSource" project but keeping it on the backlog. I don't think there is a strong motivation for scheduling it right now.

Generated at Thu Feb 08 05:04:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.