Investigate changes in SERVER-105449: Pushdown of $match past a computed field

XMLWordPrintableJSON

    • Type: Investigation
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • Developer Tools

      Original Downstream Change Summary

      This is a new high-level optimization for $project+$match or $addFields+$match or $set+$match.

      This optimization can result in different field order in the output BSON. This was already allowed for $project, $set, $addFields stages, since it can occur, and is documented in https://www.mongodb.com/docs/manual/core/document/#document-field-order
      Note that $set and $project insert computed fields using different undocumented algorithms.

      coll.drop();
      coll.insertMany([{c: 1}]);
      coll.aggregate([
          {$lookup: {as: "a", pipeline: [{$documents: []}]}},
          {$set: {b: {$add: ["$c", 2]}}},
          {$match: {b: {$ne: 1}}},
      ]).toArray();
      
      // Before: [{_id: ObjectId(...), a: [], b: 3}]
      // After:  [{_id: ObjectId(...), b: 3, a: []}]
      

      This is because the $set+$match is reordered before the $lookup, so 'b' is inserted first, then 'a'. The results are not binary-equal. The results are the same when field order is ignored.

      Description of Linked Ticket

      Imagine a pipeline that computes a new field and then matches on the result. We've recently seen an example of a customer query which involves something like this:

      [  
        {  
          $project: {  
            computed: {  
              $switch: {  
                branches: [  
                  {  
                    case: {  
                      $eq: [  
                        "$comparison_field",  
                        "foo"  
                      ]  
                    },  
                    then: "$field1"  
                  },  
                  {  
                    case: {  
                      $eq: [  
                        "$comparison_field",  
                        "bar"  
                      ]  
                    },  
                    then: "$field2"  
                  }  
                ],  
                default: { $const: null }  
              }  
            }  
          }  
        },  
        {  
          $match: {  
            computed: 42  
          }  
        }  
      ] 
      

      At the moment, the system is unable to push the $match down past the projection. Since the field is computed, this predicate is not sargable and would not help to tighten index bounds for access path selection. However, this pushdown could still be quite valuable. We saw it in the context of a relational migration in which the $match was a predicate applied on top of a complex view. This pushdown would enable the match to be pushed all the way down to the base collections of the view, and by filtering the result set early any subsequent $lookups have much less work to do.

      Pushing the $match down would require expressing a predicate on top of some computation. Theoretically this could look something like the example below:

      [  
        {  
          $match: {  
            $expr: {  
              $eq: [  
                42,
                { // Inline the $switch!
                  $switch: {  
                    branches: [  
                      {  
                        case: {  
                          $eq: [  
                            "$comparison_field",  
                            "foo"  
                          ]  
                        },  
                        then: "$field1"  
                      },  
                      {  
                        case: {  
                          $eq: [  
                            "$comparison_field",  
                            "bar"  
                          ]  
                        },  
                        then: "$field2"  
                      }  
                    ],  
                    default: null  
                  }  
                }  
              ]  
            }  
          }  
        }  
      ]  
      

      In practice, this would require correct conversion of a MatchExpression into an Expression. I don't believe this is something our system currently can do, and we will have to be careful about things like array traversal semantics and null semantics to make such a rewrite actually correct in today's engine.

            Assignee:
            Unassigned
            Reporter:
            Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: