Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-13989

Investigate changes in SERVER-49024: Disallow $lookup uncorrelated pipeline caching for stages containing $sample/$rand/$sampleRate

    XMLWordPrintableJSON

Details

    Description

      Description

      Downstream Change Summary

      Previously, using $sample in an uncorrelated subquery had inconsistent behavior: the $sample would either be cached or re-run depending on the size of the output. This would also affect $rand.

      Now, $sample and $rand don't count as "uncorrelated", so $lookup always re-runs them.

      Description of Linked Ticket

      The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.

      For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:

      {$lookup: {
      	from: 'foreign_coll',
      	pipeline: [
      		{$sample: {size: 5}},
      	],
      	as: 'docs',
      }}
      

      On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:

      {$lookup: {
      	from: 'foreign_coll',
      	let: {outer: "$_id"},
      	pipeline: [
      		{$match: {$expr: {$lt:["$_id", "$$outer"]}}},  // correlation predicate
      		{$sample: {size: 3}},
      	],
      	as: 'docs',
      }}
      

      Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.

      This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

      Attachments

        Activity

          People

            jason.price@mongodb.com Jason Price
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:
              2 years, 25 weeks, 2 days ago