Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49024

Disallow $lookup uncorrelated pipeline caching for stages containing $sample/$rand/$sampleRate

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.9.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Minor Change
    • ALL
    • Query 2020-09-07, Query 2020-11-02, Query 2020-11-16

      The $sample stage returns a different sample every time it runs. $lookup sometimes re-runs the inner pipeline per outer document, and sometimes runs it only once. This makes the behavior of $sample inside $lookup hard to predict.

      For example, this query runs the sub-pipeline only once, resulting in the same sample chosen every time:

      {$lookup: {
      	from: 'foreign_coll',
      	pipeline: [
      		{$sample: {size: 5}},
      	],
      	as: 'docs',
      }}
      

      On the other hand, this query re-runs the sub-pipeline, choosing a different sample per outer document:

      {$lookup: {
      	from: 'foreign_coll',
      	let: {outer: "$_id"},
      	pipeline: [
      		{$match: {$expr: {$lt:["$_id", "$$outer"]}}},  // correlation predicate
      		{$sample: {size: 3}},
      	],
      	as: 'docs',
      }}
      

      Since we consider DocumentSourceSequentialDocumentCache to be an optimization, there could be other exceptions to this rule. For example, if you add a dummy correlation hoping to force the inner pipeline to re-run, it can get optimized out.

      This ticket will make changes to consider any $sample stage or stage containing a $rand or $sampleRate expression to be ineligible for uncorrelated pipeline caching.

            Assignee:
            david.percy@mongodb.com David Percy
            Reporter:
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: