[SERVER-45474] $sample doesn't support variables or expression operators Created: 10/Jan/20 Updated: 23/Jun/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.2.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Luke Prochazka | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
The current $sample stage only accept a number type for the size parameter, per: https://github.com/mongodb/mongo/blob/master/src/mongo/db/pipeline/document_source_sample.cpp#L102 It would be nice to allow variables/expressions in the same way that they are almost universally available within aggregation. |
| Comments |
| Comment by Luke Prochazka [ 04/Oct/21 ] | ||||||||||||||||||||
|
asya, the use case I had in mind was to provide the ability to dynamically supply the sample size paramater as derived from the collection size. Ie to sample by a proportionate size rather than a fixed size irrespective of the collection size, as it makes more statistical sense this way and offers more control over the storage engine behaviour. Here is a sample shell procedure for illustration purposes:
Resulting in the message: "MongoServerError: size argument to $sample must be a number" The reproduction script above leverages the new v5.0 let aggregation option, which postdates this Jira's initial request, though provides a more simplistic and elegant example for illustration purposes. A pre-5.0 version would involve a more convoluted pipeline involving an uncorrelated $lookup subquery to $collStats or similar technique to push the derived document count ahead of the initial $sample. Having the ability to perform the above would for example go a long way to obviating the need for SERVER-38802. | ||||||||||||||||||||
| Comment by Asya Kamsky [ 25/May/21 ] | ||||||||||||||||||||
|
$sample applies to the entire incoming stream of documents, so what expression would it be useful to use here? |