$sample should desugar to $meta:"randVal" sort followed by $limit

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • None
    • Query Optimization
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When $sample cannot execute using the storage-engine optimized path, it executes in terms of a DocumentSourceSample. The implementation of DocumentSourceSample delegates to a DocumentSourceSort stage, which is held as a data member.

      The design of the query execution tree should typically compose operators by connecting them into a tree, rather than composing them by making one a data member of another. Therefore, we should improve the code by making $sample execute as a $meta:"randVal" sort operator with a coalesced limit value. The value of the limit is determined by the user query's sample size. This alternative implementation could be achieved by desugaring the user's $sample stage into a $sort followed by a $limit. This would rely on the planner to construct an execution tree which generates the "randVal" metadata then subsequently performs the $sort-$limit as a top-k sort.

              Assignee:
              [DO NOT USE] Backlog - Query Optimization
              Reporter:
              David Storch
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: