Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27213

Two $match stages combine incorrectly, yielding incorrect results.

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Using MongoDB 3.4, run the following commands :

      use threefour;
      db.test.insert({
      	"id":2,
      	"name":"MongoDB",
      	"age":NumberInt(7)
      });
       
      db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ])
      

      >> Will return nothing, but should return the projected document.

      Show
      Using MongoDB 3.4, run the following commands : use threefour; db.test.insert({ "id":2, "name":"MongoDB", "age":NumberInt(7) });   db.test.aggregate([ {"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}}, {'$project':{"age":1,"name":1, '_id':0}}, {'$match':{'age':{'$gte':5}}}, {'$match':{'name':{'$eq':'MongoDB'} }} ]) >> Will return nothing, but should return the projected document.
    • Sprint:
      Query 2016-12-12

      Description

      In certain scenarios, when an aggregation pipeline has two $match stages either consecutively or in positions such that they will be swapped to be next to each other, the pipeline optimization will combine the two predicates using a $and. In this case the predicates of the second match will not be applied correctly, yielding incorrect results.

      To see if your pipeline is impacted by this issue you can run explain on your aggregation and look for two $match stages being combined like those above. Note that $match stages that are at the front of the pipeline or near the front of the pipeline may be absorbed into the query system and will appear in the $cursor section of the explain output. If this is the case, these matches will occur correctly.

      In more detail, the $match stage has a concept of which 'dependencies' are needed to perform the match. For example, this stage

      {$match: {age: {$lte: 15}, score: {$gt: 95}}}
      

      would need the 'age' field and the 'score' field in order to perform the match. As part of the matching process, the fields needed to answer the predicate will be serialized into a representation (a BSONObj) that can be understood by the query system's matching machinery. When two $match stages are combined, the first $match stage will absorb the second one, but it will not recompute which fields are needed to perform the match.

      For example, if our pipeline was

      [{$match: {age: {$lt: 15}, score: {$gt: 95}}}, {$match: {class: {$eq: "chemistry"}}}]
      

      it would be optimized to just a single $match:

      [{$match: {$and: [{age: {$lt: 15}, score: {$gt: 95}}, {class: {$eq: "chemistry"}}]}}]
      

      This combined version of the $match would incorrectly think that it only needed to look at the 'age' and 'score' fields to decide if it matched the predicate, so the 'class' field would always be missing from the serialized form, and so would never compare equal to 'chemistry'.

      Note that without other stages, the pipeline used in this example would behave correctly since the $match stages could be absorbed by the query system. For this issue to occur, there would need to be some stage in front of the $matches (like a $group or a $limit, for example) that would prevent the matches from being moved earlier.

      Original Description

      The problem occurs on MongoDB 3.4.0-rc5 and 3.4.0 GA.

      I didn't occur before and version 3.2.4 and 3.2.10 work fine (didn't test in between).

      Say I have a collection with only 1 document :

      {{db.test.insert({
      	"id":2,
      	"name":"MongoDB",
      	"age":NumberInt(7)
      });}}
      

      A simple query like that works :

      {{db.test.aggregate([
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);
      >> { "_id" : ObjectId("583dd8a03c6145e913392a6e"), "id" : 2, "name" : "MongoDB", "age" : 7 }}}
      

      Adding a $project stage does not break the request :

      {{db.test.aggregate([
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);
      >> { "name" : "MongoDB", "age" : 7 }}}
      

      But adding a $group stage at the beginning does break the request (without error) and returns nothing :

      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ]);}}
       
      Please note that running the $match stages before the $group does not break the request.
      {{db.test.aggregate([
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }},
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}}
      ]);}}
       
      I ran an explain query on it, and it appears however that the $match stage is correct except that the stages are not in the correct order (they are in 3.2.10).
      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ], {'explain':true})}}
      >> gives $cursor, $group, $match, $project in a sorted array.
      

      Finally, moving the $project between the 2 $match does not help, and output the same explain (it does not in 3.2.10, with the correct order and 2 distinct $match stages).

      {{db.test.aggregate([
      	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
      	{'$match':{'age':{'$gte':5}}},
      	{'$project':{"age":1,"name":1, '_id':0}},
      	{'$match':{'name':{'$eq':'MongoDB'} }}
      ], {'explain':true})}}
      

      This is a major issue and is preventing us from moving from 3.2 to 3.4.

      Off course, the pipelines here are for this bug reporting only, but we actually have programaticaly built pipelines with consecutive $match operations.

        Issue Links

          Activity

          Hide
          hmducoulombier@marketing1by1.com Henri-Maxime Ducoulombier added a comment - - edited

          3.4.0-rc3 is also impacted

          Here is the explain for the following request :

          db.test.aggregate([
          	{"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},
           
          		{'$match':{'age':{'$gte':5}}},
          	{'$project':{"age":1,"name":1, '_id':0}},
          	{'$match':{'name':{'$eq':'MongoDB'} }}
          ], {'explain':true})
          

          Explain:

          {
          	"stages" : [
          		{
          			"$cursor" : {
          				"query" : {
          					
          				},
          				"fields" : {
          					"age" : 1,
          					"name" : 1,
          					"_id" : 1
          				},
          				"queryPlanner" : {
          					"plannerVersion" : 1,
          					"namespace" : "threefour.test",
          					"indexFilterSet" : false,
          					"parsedQuery" : {
          						
          					},
          					"winningPlan" : {
          						"stage" : "COLLSCAN",
          						"direction" : "forward"
          					},
          					"rejectedPlans" : [ ]
          				}
          			}
          		},
          		{
          			"$group" : {
          				"_id" : "$_id",
          				"name" : {
          					"$first" : "$name"
          				},
          				"age" : {
          					"$first" : "$age"
          				}
          			}
          		},
          		{
          			"$match" : {
          				"$and" : [
          					{
          						"age" : {
          							"$gte" : 5
          						}
          					},
          					{
          						"name" : {
          							"$eq" : "MongoDB"
          						}
          					}
          				]
          			}
          		},
          		{
          			"$project" : {
          				"_id" : false,
          				"name" : true,
          				"age" : true
          			}
          		}
          	],
          	"ok" : 1
          }
          

          Show
          hmducoulombier@marketing1by1.com Henri-Maxime Ducoulombier added a comment - - edited 3.4.0-rc3 is also impacted Here is the explain for the following request : db.test.aggregate([ {"$group":{'_id':'$_id', 'name':{'$first':'$name'}, 'age':{'$first':'$age'}}},   {'$match':{'age':{'$gte':5}}}, {'$project':{"age":1,"name":1, '_id':0}}, {'$match':{'name':{'$eq':'MongoDB'} }} ], {'explain':true}) Explain: { "stages" : [ { "$cursor" : { "query" : { }, "fields" : { "age" : 1, "name" : 1, "_id" : 1 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "threefour.test", "indexFilterSet" : false, "parsedQuery" : { }, "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$group" : { "_id" : "$_id", "name" : { "$first" : "$name" }, "age" : { "$first" : "$age" } } }, { "$match" : { "$and" : [ { "age" : { "$gte" : 5 } }, { "name" : { "$eq" : "MongoDB" } } ] } }, { "$project" : { "_id" : false, "name" : true, "age" : true } } ], "ok" : 1 }
          Hide
          hmducoulombier@marketing1by1.com Henri-Maxime Ducoulombier added a comment -

          (sorry for the bad formatting)

          Show
          hmducoulombier@marketing1by1.com Henri-Maxime Ducoulombier added a comment - (sorry for the bad formatting)
          Hide
          charlie.swanson Charlie Swanson added a comment -

          It looks like the two match stages are being combined into one, but we fail to recompute the dependencies of the match. In the reproducer, the first match knows that it only needs the 'age' field (computed in the constructor here). Later, we optimize the pipeline and combine the two matches into a single match stage by using DocumentSourceMatch::joinMatchWith(). This joining process absorbs the additional predicate on 'name', but fails to recompute the dependencies, so during getNext() it incorrectly thinks that the dependencies are only the 'age' field, and so it only serializes the BSON object {age: 7} to be matched against the query predicate

          {
          			"$match" : {
          				"$and" : [
          					{
          						"name" : "MongoDB"
          					},
          					{
          						"age" : {
          							"$gte" : 5
          						}
          					}
          				]
          			}
          		}
          

          Which of course doesn't match.

          Show
          charlie.swanson Charlie Swanson added a comment - It looks like the two match stages are being combined into one, but we fail to recompute the dependencies of the match. In the reproducer, the first match knows that it only needs the 'age' field (computed in the constructor here ). Later, we optimize the pipeline and combine the two matches into a single match stage by using DocumentSourceMatch::joinMatchWith() . This joining process absorbs the additional predicate on 'name', but fails to recompute the dependencies, so during getNext() it incorrectly thinks that the dependencies are only the 'age' field, and so it only serializes the BSON object { age: 7 } to be matched against the query predicate { "$match" : { "$and" : [ { "name" : "MongoDB" }, { "age" : { "$gte" : 5 } } ] } } Which of course doesn't match.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

          Message: SERVER-27213 Recompute dependencies when joining matches
          Branch: master
          https://github.com/mongodb/mongo/commit/5a58e613d59febb9c32ae1940cbb8663cabe5095

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'} Message: SERVER-27213 Recompute dependencies when joining matches Branch: master https://github.com/mongodb/mongo/commit/5a58e613d59febb9c32ae1940cbb8663cabe5095
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

          Message: SERVER-27213 Recompute dependencies when joining matches
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/15290f5fe583d4d77e865046c1d8b8d415c2f457

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'} Message: SERVER-27213 Recompute dependencies when joining matches Branch: v3.4 https://github.com/mongodb/mongo/commit/15290f5fe583d4d77e865046c1d8b8d415c2f457

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile