[SERVER-54220] Implement $sample fallback in $_internalUnpackBucket. Created: 02/Feb/21  Updated: 29/Oct/23  Resolved: 12/Apr/21

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 5.0 Required
Fix Version/s: 5.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Eric Cox (Inactive) Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
Backwards Compatibility: Fully Compatible
Sprint: Query Execution 2021-03-22, Query Execution 2021-04-05, Query Execution 2021-04-19
Participants:

 Description   

If the heuristics are below a determined threshold, the $sample pushdown algorithm should default to the top-k sorting algorithm used in $sample currently. This would unpack all of the buckets and proceed with the top-k sorting algo.

We should also explore the heuristic space, determine a minimal set of significant variables and implement a threshold switch to toggle the fall-back or optimized $sample algorithm in $_internalUnpackBucket. 



 Comments   
Comment by Githook User [ 12/Apr/21 ]

Author:

{'name': 'Eric Cox', 'email': 'eric.cox@mongodb.com', 'username': 'ericox'}

Message: SERVER-54220 Change heuristic for bailing out of SAMPLE_FROM_TIMESERIES_BUCKET plan

If the requested sample size for a $sample against a timeseries
collection exceeds 1% of the maximum possible number of measurements
(numBuckets * maxMeasurementsPerBucket), then we never consider using
the ARHASH sampling algorithm.

Co-authored-by: David Storch <david.storch@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/4741ef2d5ca6a0809c2569cc55142bb1f7ed7547

Generated at Thu Feb 08 05:32:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.