Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27213

Two $match stages combine incorrectly, yielding incorrect results.

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Using MongoDB 3.4, run the following commands :

      use threefour;
      db.test.insert({
      	"id":2,
      	"name":"MongoDB",
      	"age":NumberInt(7)
      });
       
      db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ])
      

      >> Will return nothing, but should return the projected document.

      Show
      Using MongoDB 3.4, run the following commands : use threefour; db.test.insert({ "id":2, "name":"MongoDB", "age":NumberInt(7) });   db.test.aggregate([ {"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}}, {'$project':{"age":1,"name":1, '_id':0}}, {'$match':{'age':{'$gte':5}}}, {'$match':{'name':{'$eq':'MongoDB'} }} ]) >> Will return nothing, but should return the projected document.
    • Sprint:
      Query 2016-12-12

      Description

      In certain scenarios, when an aggregation pipeline has two $match stages either consecutively or in positions such that they will be swapped to be next to each other, the pipeline optimization will combine the two predicates using a $and. In this case the predicates of the second match will not be applied correctly, yielding incorrect results.

      To see if your pipeline is impacted by this issue you can run explain on your aggregation and look for two $match stages being combined like those above. Note that $match stages that are at the front of the pipeline or near the front of the pipeline may be absorbed into the query system and will appear in the $cursor section of the explain output. If this is the case, these matches will occur correctly.

      In more detail, the $match stage has a concept of which 'dependencies' are needed to perform the match. For example, this stage

      {$match: {age: {$lte: 15}, score: {$gt: 95}}}
      

      would need the 'age' field and the 'score' field in order to perform the match. As part of the matching process, the fields needed to answer the predicate will be serialized into a representation (a BSONObj) that can be understood by the query system's matching machinery. When two $match stages are combined, the first $match stage will absorb the second one, but it will not recompute which fields are needed to perform the match.

      For example, if our pipeline was

      [{$match: {age: {$lt: 15}, score: {$gt: 95}}}, {$match: {class: {$eq: "chemistry"}}}]
      

      it would be optimized to just a single $match:

      [{$match: {$and: [{age: {$lt: 15}, score: {$gt: 95}}, {class: {$eq: "chemistry"}}]}}]
      

      This combined version of the $match would incorrectly think that it only needed to look at the 'age' and 'score' fields to decide if it matched the predicate, so the 'class' field would always be missing from the serialized form, and so would never compare equal to 'chemistry'.

      Note that without other stages, the pipeline used in this example would behave correctly since the $match stages could be absorbed by the query system. For this issue to occur, there would need to be some stage in front of the $matches (like a $group or a $limit, for example) that would prevent the matches from being moved earlier.

      Original Description

      The problem occurs on MongoDB 3.4.0-rc5 and 3.4.0 GA.

      I didn't occur before and version 3.2.4 and 3.2.10 work fine (didn't test in between).

      Say I have a collection with only 1 document :

      {{db.test.insert({
      	"id":2,
      	"name":"MongoDB",
      	"age":NumberInt(7)
      });}}
      

      A simple query like that works :

      {{db.test.aggregate([
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);
      >> { "_id" : ObjectId("583dd8a03c6145e913392a6e"), "id" : 2, "name" : "MongoDB", "age" : 7 }}}
      

      Adding a $project stage does not break the request :

      {{db.test.aggregate([
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);
      >> { "name" : "MongoDB", "age" : 7 }}}
      

      But adding a $group stage at the beginning does break the request (without error) and returns nothing :

      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);}}
       
      Please note that running the $match stages before the $group does not break the request.
      {{db.test.aggregate([
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }},
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}}
      ]);}}
       
      I ran an explain query on it, and it appears however that the $match stage is correct except that the stages are not in the correct order (they are in 3.2.10).
      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ], {'explain':true})}}
      >> gives $cursor, $group, $match, $project in a sorted array.
      

      Finally, moving the $project between the 2 $match does not help, and output the same explain (it does not in 3.2.10, with the correct order and 2 distinct $match stages).

      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ], {'explain':true})}}
      

      This is a major issue and is preventing us from moving from 3.2 to 3.4.

      Off course, the pipelines here are for this bug reporting only, but we actually have programaticaly built pipelines with consecutive $match operations.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: