[SERVER-22225] Allow seed to be specified during $sample. Created: 19/Jan/16  Updated: 06/Apr/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Dissatisfied Former User Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

There are several tickets currently open relating to statistical bias (SERVER-22068) and incorrect seeding (SERVER-22069). This request relates to the ability to explicitly pass in a PRNG seed as part of the $sample pipeline.

There are a number of situations that come immediately to mind where the ability to explicitly specify a seed would be beneficial. The first, that I mentioned via IRC, is that of an e-commerce system showing "random featured products" which rotate daily. Using the current Julian calendar date as the seed would naturally rotate daily and allow for both historical and future/predictive result generation. This is similar to the pattern games such as Minecraft use to allow specifying a seed for algorithmic world generation.

The syntax would become:

    { $sample: { size: <positive integer>, seed: <integer> } }

The seed, if omitted, could be PRNG generated itself.



 Comments   
Comment by Justin Knight [ 01/Dec/20 ]

Was looking to use $sample for selecting a subset of data for machine learning but needed to specify seed as one of the requirements.  As a result will have to implement sampling manually in the application.  +1 for this enhancement request.

Generated at Thu Feb 08 03:59:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.