-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework, Querying
-
Labels:None
-
Query Optimization
When $sample cannot execute using the storage-engine optimized path, it executes in terms of a DocumentSourceSample. The implementation of DocumentSourceSample delegates to a DocumentSourceSort stage, which is held as a data member.
The design of the query execution tree should typically compose operators by connecting them into a tree, rather than composing them by making one a data member of another. Therefore, we should improve the code by making $sample execute as a $meta:"randVal" sort operator with a coalesced limit value. The value of the limit is determined by the user query's sample size. This alternative implementation could be achieved by desugaring the user's $sample stage into a $sort followed by a $limit. This would rely on the planner to construct an execution tree which generates the "randVal" metadata then subsequently performs the $sort-$limit as a top-k sort.