Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91102

Moving $match before $group is incorrect when predicate distinguishes equal values

    • Query Optimization
    • Fully Compatible
    • ALL

      In SERVER-34741 we added this optimization:

      {$group: {_id: "$foo", ...}}, {$match: {_id: ...}}
      ->
      {$match: {foo: ...}}, {$group: {_id: "$foo", ...}}, 
      

      When a $match only touches the group key (_id), we move it before the group (and rename appropriately).

      It's incorrect for a $type predicate, because $type can distinguish between values that compare equal.

      For example:

      > db.c.find()
      { "_id" : 1, "a" : NumberLong(5) }
      { "_id" : 2, "a" : 5 }
      
      > db.c.aggregate([ {$group: {_id: "$a", n: {$count: {}} }} ])
      { "_id" : NumberLong(5), "n" : 2 }
      
      > db.c.aggregate([ {$group: {_id: "$a", n: {$count: {}} }}, {$match: {_id: {$type: 'long'}}} ])
      { "_id" : NumberLong(5), "n" : 1 }
      

      The $match after the $group should only be able to keep/drop whole groups, but here it changed the count 'n' within a group.

      We should only push down the predicate when it treats equal values the same. And since we may add predicates over time, we should enable the optimization only in cases we know work, rather than disabling it in specific cases we know don't work.

      Other things to consider:

      • Custom collations affect what "compare equal" means. Which predicates can distinguish values that are collation-equal? Maybe $regex would be one.
      • Does this interact with SERVER-73253, which extended this optimization to support dotted paths?

            Assignee:
            daniel.segel@mongodb.com Daniel Segel
            Reporter:
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: