Details
-
Task
-
Resolution: Done
-
Major - P3
-
None
-
None
Description
Description
Previously, using $sample in an uncorrelated subquery had inconsistent behavior: the $sample would either be cached or re-run depending on the size of the output. This would also affect $rand.
Now, $sample and $rand don't count as "uncorrelated", so $lookup always re-runs them.
Description of Linked Ticket
The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.
For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:
{$lookup: {
|
from: 'foreign_coll',
|
pipeline: [
|
{$sample: {size: 5}},
|
],
|
as: 'docs',
|
}}
|
On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:
{$lookup: {
|
from: 'foreign_coll',
|
let: {outer: "$_id"},
|
pipeline: [
|
{$match: {$expr: {$lt:["$_id", "$$outer"]}}}, // correlation predicate
|
{$sample: {size: 3}},
|
],
|
as: 'docs',
|
}}
|
Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.
This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.
Scope of changes
Impact to Other Docs
MVP (Work and Date)
Resources (Scope or Design Docs, Invision, etc.)
Attachments
Issue Links
- documents
-
SERVER-49024 Disallow $lookup uncorrelated pipeline caching for stages containing $sample/$rand/$sampleRate
-
- Closed
-