[SERVER-49024] Disallow $lookup uncorrelated pipeline caching for stages containing $sample/$rand/$sampleRate Created: 22/Jun/20  Updated: 29/Oct/23  Resolved: 10/Nov/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: David Percy Assignee: David Percy
Resolution: Fixed Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-13989 Investigate changes in SERVER-49024: ... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Sprint: Query 2020-09-07, Query 2020-11-02, Query 2020-11-16
Participants:

 Description   

The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.

For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:

{$lookup: {
	from: 'foreign_coll',
	pipeline: [
		{$sample: {size: 5}},
	],
	as: 'docs',
}}

On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:

{$lookup: {
	from: 'foreign_coll',
	let: {outer: "$_id"},
	pipeline: [
		{$match: {$expr: {$lt:["$_id", "$$outer"]}}},  // correlation predicate
		{$sample: {size: 3}},
	],
	as: 'docs',
}}

Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.

This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.



 Comments   
Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'David Percy', 'email': 'david.percy@mongodb.com', 'username': 'dpercy'}

Message: SERVER-49024 Disallow $lookup caching of stages containing $rand, $sample
Branch: master
https://github.com/mongodb/mongo/commit/8011b6129fc08a7dcbd675da737e63a22f1ef362

Generated at Thu Feb 08 05:18:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.