Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49721

[SBE] Using dot notation in find() to query embedded document in array field is broken in some cases

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • Labels:
    • ALL
    • Hide
      > db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: false})
      { "was" : false, "ok" : 1 }
      > db.c.find({},{_id:0})
      { "a" : [ {  }, { "b" : "foo" } ] }
      { "a" : [ { "b" : "foo" }, {  } ] }
      > db.c.find({"a.b":"foo"},{_id:0})
      { "a" : [ {  }, { "b" : "foo" } ] }
      { "a" : [ { "b" : "foo" }, {  } ] }
      > db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: true})
      { "was" : false, "ok" : 1 }
      > db.c.find({"a.b":"foo"},{_id:0})
      { "a" : [ { "b" : "foo" }, {  } ] }

       

      Show
      > db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: false}) { "was" : false, "ok" : 1 } > db.c.find({},{_id:0}) { "a" : [ { }, { "b" : "foo" } ] } { "a" : [ { "b" : "foo" }, { } ] } > db.c.find({"a.b":"foo"},{_id:0}) { "a" : [ { }, { "b" : "foo" } ] } { "a" : [ { "b" : "foo" }, { } ] } > db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: true}) { "was" : false, "ok" : 1 } > db.c.find({"a.b":"foo"},{_id:0}) { "a" : [ { "b" : "foo" }, { } ] }  
    • Query 2020-07-27

      When looking at possible ways to refactor some of the logic in "sbe_stage_builder_filter.cpp", I enabled SBE mode and tested out some cases that involved using dot notation in find() to query embedded documents inside an array field, and I noticed it was broken in some cases.

      Given a document with a field "a" that contains an array of with two embedded documents, when the dot notation "a.b" is used in the find() command to match against the contents of field "b", it appears impossible for the operator to ever match against the second embedded document in the array if the first embedded document does not contain field "b". See the example in the "Steps to Reproduce" section for a specific example.

      From my initial investigation, it seems this is happening because of the specific "fold" expression that is being passed into TraverseStage in "sbe_stage_builder_filter.cpp" and how this expression behaves when it is translated to bytecode and executed in the VM.

      The "fold" expression being pased into TraverseStage is "logicOr(_outField, _outFieldInner)". I looked at the bytecode that is generated for this expression and how it behaved. First, it is important to note that there is no implicit "cast to bool" that is happening to the arguments being passed to logicOr(), but rather the values of the arguments are being directly fed to the logicOr(). The bytecode generated doesn't do quite what I would have expected. Here is a summary of what the bytecode essentially does:
      1) If LHS is Nothing or Boolean True, then logicOr() returns LHS.
      2) If LHS is not Nothing or Boolean True, then logicOr() returns RHS.

      Now let's consider the example in "Steps to Reproduce" and look at why db.c.find({"a.b":"foo"}) doesn't match the first document in c even though it should. When TraverseStage applies the projection (getField("b")) to the first element of the array in field "a", getField("b") returns Nothing (because the embedded document doesn't have a field named "b"). Because this is the first element of the array, Traverse doesn't evaluate the fold expression - instead it just stores Nothing directly in the "_outField" slot. Next, TraverseStage applies the projection (getField()) to the second element of the array, and getField("b") returns "foo". TraverseStage then evaluates the fold expression, which in this case is 'logicOr(Nothing,"foo")'. Because of the behavior of logicOr() that I described above, 'logicOr(Nothing,"foo")' returns Nothing. As a result, the first document in c doesn't pass the filter even though it should.

      Given this, in order to fix this bug I think we need to do one or more of the following:
      1) Change the fold expression we pass to TraverseStage in "sbe_stage_builder_filter.cpp".
      2) Change logicOr() to implicitly coerce both of its arguments to boolean first.
      3) Change logicOr() to behave differently when one or both of its arguments is Nothing.

            Assignee:
            andrew.paroski@mongodb.com Drew Paroski
            Reporter:
            andrew.paroski@mongodb.com Drew Paroski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: