[DOCS-13989] Investigate changes in SERVER-49024: Disallow $lookup uncorrelated pipeline caching for stages containing $sample/$rand/$sampleRate Created: 17/Nov/20  Updated: 13/Nov/23  Resolved: 16/Aug/21

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 4.9.0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Jason Price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-49024 Disallow $lookup uncorrelated pipelin... Closed
Participants:
Days since reply: 2 years, 25 weeks, 2 days ago
Epic Link: DOCSP-15042
Story Points: 3

 Description   

Description

Downstream Change Summary

Previously, using $sample in an uncorrelated subquery had inconsistent behavior: the $sample would either be cached or re-run depending on the size of the output. This would also affect $rand.

Now, $sample and $rand don't count as "uncorrelated", so $lookup always re-runs them.

Description of Linked Ticket

The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.

For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:

{$lookup: {
	from: 'foreign_coll',
	pipeline: [
		{$sample: {size: 5}},
	],
	as: 'docs',
}}

On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:

{$lookup: {
	from: 'foreign_coll',
	let: {outer: "$_id"},
	pipeline: [
		{$match: {$expr: {$lt:["$_id", "$$outer"]}}},  // correlation predicate
		{$sample: {size: 3}},
	],
	as: 'docs',
}}

Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.

This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 16/Aug/21 ]

Author:

{'name': 'jason-price-mongodb', 'email': 'jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com'}

Message: DOCS-13989 disallow lookup uncorrelated pipeline caching
Branch: master
https://github.com/mongodb/docs/commit/e8d47a79d48b680b77b4d1e6b327e9521c970fe6

Generated at Thu Feb 08 08:09:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.