Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63030

Correlated sub-pipeline analysis is incorrect for $unionWith, leading to incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Duplicate
    • 5.0.5, 4.4.12, 5.2.0-rc6
    • None
    • None
    • None
    • ALL
    • Hide

      The following script reproduces the bug:

      (function() {
      "use strict";
       
      const localColl = db.local_coll;
      localColl.drop();
       
      assert.commandWorked(localColl.insert({a: "notMagic"}));
      assert.commandWorked(localColl.insert({a: "magicConstant"}));
       
      const firstUnionColl = db.first_union_coll;
      firstUnionColl.drop();
      assert.commandWorked(firstUnionColl.insert({unionedData: 1}));
       
      const secondUnionColl = db.second_union_coll;
      secondUnionColl.drop();
      assert.commandWorked(secondUnionColl.insert({unionedData: 2}));
       
      let results = localColl.aggregate([
          {$lookup: {
              from: firstUnionColl.getName(),
              as: "as",
              let: {correlatedVar: "$a"},
              pipeline: [
                  {$_internalInhibitOptimization: {}},
                  {$unionWith: {
                      coll: secondUnionColl.getName(),
                      pipeline: [
                          {$match: {$expr: {$eq: ["$$correlatedVar", "magicConstant"]}}}
                      ]
                  }}
              ]
          }}
      ]).toArray();
       
      printjson(results);
      }());
      

      Show
      The following script reproduces the bug: (function() { "use strict";   const localColl = db.local_coll; localColl.drop();   assert.commandWorked(localColl.insert({a: "notMagic"})); assert.commandWorked(localColl.insert({a: "magicConstant"}));   const firstUnionColl = db.first_union_coll; firstUnionColl.drop(); assert.commandWorked(firstUnionColl.insert({unionedData: 1}));   const secondUnionColl = db.second_union_coll; secondUnionColl.drop(); assert.commandWorked(secondUnionColl.insert({unionedData: 2}));   let results = localColl.aggregate([ {$lookup: { from: firstUnionColl.getName(), as: "as", let: {correlatedVar: "$a"}, pipeline: [ {$_internalInhibitOptimization: {}}, {$unionWith: { coll: secondUnionColl.getName(), pipeline: [ {$match: {$expr: {$eq: ["$$correlatedVar", "magicConstant"]}}} ] }} ] }} ]).toArray();   printjson(results); }());
    • QO 2022-02-21

    Description

      The $lookup stage has some analysis to try to identify a non-correlated prefix of its sub-pipeline. It such a prefix is found, then it can be executed just once and the results materialized, in order to avoid repeated execution of the non-correlated portion of the query.

      For this optimization to work, the implementation relies on the getDependencies() implementation of each DocumentSource to correctly report all of its variable references. This allows the code looking for a non-correlated prefix to check whether any correlated variables are used. However, DocumentSourceUnionWith::getDependencies() is unimplemented right now. Therefore, it fails to report any of the variable references inside its sub-pipeline.

      As a result, if there is a $lookup with a $unionWith inside of it, and the $unionWith's sub-pipeline has a correlated variable reference, then the correlation analysis will incorrectly identify the $unionWith stage as non-correlated. Materialization of a supposedly non-correlated pipeline prefix will kick in, resulting in incorrect query results. The attached repro script has a concrete example of this situation.

      Attachments

        Issue Links

          Activity

            People

              alya.berciu@mongodb.com Alya Berciu
              david.storch@mongodb.com David Storch
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: