Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79205

[CQF] $not $eq array, on a missing field, should return true

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • ALL
    • QO 2023-08-21

      This is something the fuzzer discovered.

      Example collection and queries on classic:

      > db1.c.find()
      { "_id" : 5 }
      
      > db1.c.find({array: {$eq: ['a']}})
      // empty results
      
      > db1.c.find({array: {$not: {$eq: ['a']}}})
      { "_id" : 5 }
      

      Same example with forceBonsai:

      > db2.c.find()
      { "_id" : 5 }
      
      > db2.c.find({array: {$eq: ['a']}})
      // empty results
      
      > db2.c.find({array: {$not: {$eq: ['a']}}})
      // empty results -- incorrect
      

      This only happens when the constant is an array. For example this query is correct:

      > db1.c.find({array: {$not: {$eq: 'a'}}})
      { "_id" : 5 }
      > db2.c.find({array: {$not: {$eq: 'a'}}})
      { "_id" : 5 }
      

      The Bonsai plan before optimization is:

      ********* Translated ABT *********
      explain : Root [{p0}]
      Filter []
      |   EvalFilter []
      |   |   Variable [p0]
      |   PathConstant [] UnaryOp [Not] EvalFilter []
      |   |   Variable [p0]
      |   PathGet [array] PathComposeA []
      |   |   PathCompare [Eq] Const [["a"]]
      |   PathTraverse [1] PathCompare [Eq] Const [["a"]]
      Scan [c_808eb2a6-144a-4226-a997-e279bb2cc7c3, {p0}]
      
      ********* Translated ABT *********
      

      and after optimization:

      ********* Optimized ABT *********
      explain :
      Root [{p0}]
      Filter []
      |   EvalFilter []
      |   |   Variable [p0]
      |   PathGet [array] PathCompare [Neq] Const [["a"]]
      Filter []
      |   EvalFilter []
      |   |   Variable [p0]
      |   PathGet [array] PathLambda [] LambdaAbstraction [p2] UnaryOp [Not] EvalFilter []
      |   |   Variable [p2]
      |   PathTraverse [1] PathCompare [Eq] Const [["a"]]
      PhysicalScan [{'<root>': p0}, c_808eb2a6-144a-4226-a997-e279bb2cc7c3]
      
      ********* Optimized ABT *********
      

      It looks like the relevant rewrites are:

      • 'class NotPushdown', which includes:
        • pushing down Not through PathComposeA disjunction (DeMorgan's law)
        • combining Not Eq into Neq
      • 'struct SubstituteConvert<FilterNode>', which splits a PathComposeM conjunction into a sequence of two Filter stages.

      This must be happening due to confusion around Nothing vs False. Missing fields are represented as Nothing, and most operations on Nothing return Nothing. I think even UnaryOp Not preserves Nothing. But at some point we coerce Nothing to False: this either happens in the Filter stage or in the EvalFilter expression.

      I would probably first try disabling the above rewrites and see whether that makes the result correct. Then we can decide what changes are necessary to the rewrites, or to the initial ABT translation.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            david.percy@mongodb.com David Percy
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: