Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0
Affects Version/s: None
Component/s: None
Labels:
- qopt-team

Backwards Compatibility:
Minor Change
Operating System:
ALL
Sprint:
Query 2020-09-07, Query 2020-11-02, Query 2020-11-16
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.

For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:

{$lookup: {
	from: 'foreign_coll',
	pipeline: [
		{$sample: {size: 5}},
	],
	as: 'docs',
}}

On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:

{$lookup: {
	from: 'foreign_coll',
	let: {outer: "$_id"},
	pipeline: [
		{$match: {$expr: {$lt:["$_id", "$$outer"]}}},  // correlation predicate
		{$sample: {size: 3}},
	],
	as: 'docs',
}}

Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.

This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.

Assignee:: David Percy (Inactive)
Reporter:: David Percy (Inactive)
Participants:: David Percy, Githook User
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jun 22 2020 09:41:37 PM UTC
Updated:: Oct 29 2023 10:06:40 PM UTC
Resolved:: Nov 10 2020 06:38:32 PM UTC
Confidence Status Last Update:: 21/Oct/20 7:49 PM

Details

Description

Attachments

Activity

People

Dates