Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63030

Correlated sub-pipeline analysis is incorrect for $unionWith, leading to incorrect results

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.5, 4.4.12, 5.2.0-rc6
    • Component/s: None
    • None
    • ALL
    • Hide

      The following script reproduces the bug:

      (function() {
      "use strict";
      
      const localColl = db.local_coll;
      localColl.drop();
      
      assert.commandWorked(localColl.insert({a: "notMagic"}));
      assert.commandWorked(localColl.insert({a: "magicConstant"}));
      
      const firstUnionColl = db.first_union_coll;
      firstUnionColl.drop();
      assert.commandWorked(firstUnionColl.insert({unionedData: 1}));
      
      const secondUnionColl = db.second_union_coll;
      secondUnionColl.drop();
      assert.commandWorked(secondUnionColl.insert({unionedData: 2}));
      
      let results = localColl.aggregate([
          {$lookup: {
              from: firstUnionColl.getName(),
              as: "as",
              let: {correlatedVar: "$a"},
              pipeline: [
                  {$_internalInhibitOptimization: {}},
                  {$unionWith: {
                      coll: secondUnionColl.getName(),
                      pipeline: [
                          {$match: {$expr: {$eq: ["$$correlatedVar", "magicConstant"]}}}
                      ]
                  }}
              ]
          }}
      ]).toArray();
      
      printjson(results);
      }());
      
      Show
      The following script reproduces the bug: (function() { "use strict" ; const localColl = db.local_coll; localColl.drop(); assert .commandWorked(localColl.insert({a: "notMagic" })); assert .commandWorked(localColl.insert({a: "magicConstant" })); const firstUnionColl = db.first_union_coll; firstUnionColl.drop(); assert .commandWorked(firstUnionColl.insert({unionedData: 1})); const secondUnionColl = db.second_union_coll; secondUnionColl.drop(); assert .commandWorked(secondUnionColl.insert({unionedData: 2})); let results = localColl.aggregate([ {$lookup: { from: firstUnionColl.getName(), as: "as" , let: {correlatedVar: "$a" }, pipeline: [ {$_internalInhibitOptimization: {}}, {$unionWith: { coll: secondUnionColl.getName(), pipeline: [ {$match: {$expr: {$eq: [ "$$correlatedVar" , "magicConstant" ]}}} ] }} ] }} ]).toArray(); printjson(results); }());
    • QO 2022-02-21

      The $lookup stage has some analysis to try to identify a non-correlated prefix of its sub-pipeline. It such a prefix is found, then it can be executed just once and the results materialized, in order to avoid repeated execution of the non-correlated portion of the query.

      For this optimization to work, the implementation relies on the getDependencies() implementation of each DocumentSource to correctly report all of its variable references. This allows the code looking for a non-correlated prefix to check whether any correlated variables are used. However, DocumentSourceUnionWith::getDependencies() is unimplemented right now. Therefore, it fails to report any of the variable references inside its sub-pipeline.

      As a result, if there is a $lookup with a $unionWith inside of it, and the $unionWith's sub-pipeline has a correlated variable reference, then the correlation analysis will incorrectly identify the $unionWith stage as non-correlated. Materialization of a supposedly non-correlated pipeline prefix will kick in, resulting in incorrect query results. The attached repro script has a concrete example of this situation.

            Assignee:
            alya.berciu@mongodb.com Alya Berciu
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: