[SERVER-49943] [SBE] Fix edge case bug with 'find({"a.b.c": ..})' Created: 28/Jul/20  Updated: 29/Oct/23  Resolved: 19/Nov/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Drew Paroski Assignee: Melodee Li
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

> db.c.find({}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
{ "a" : [ { "b" : [ ] }, { "b" : [ { "c" : [5, 7] } ] } ] }

> db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: false})
{ "was" : false, "ok" : 1 }
> db.c.find({"a.b.c":  {$eq: 5}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
{ "a" : [ { "b" : [ ] }, { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$in: [5]}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
{ "a" : [ { "b" : [ ] }, { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$size: 2}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
{ "a" : [ { "b" : [ ] }, { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$elemMatch: {$eq: 5}}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
{ "a" : [ { "b" : [ ] }, { "b" : [ { "c" : [5, 7] } ] } ] }

> db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: true})
{ "was" : false, "ok" : 1 }
> db.c.find({"a.b.c":  {$eq: 5}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$in: [5]}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$size: 2}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }
> db.c.find({"a.b.c":  {$elemMatch: {$eq: 5}}}, {_id: 0})
{ "a" : [ { "b" : [ { "c" : [5, 7] } ] } ] }

Sprint: Query 2020-11-30
Participants:

 Description   

While doing some manual testing with SBE mode enable, I encountered an edge case involving dot notation where find() doesn't always match a document when it should.

See "Steps To Reproduce" for a specific example of the bug I encountered.

Upon investigation, I found that despite all the recent work (SERVER-49686, SERVER-49723, SERVER-49819) that added fillEmpty() calls in various places in "sbe_stage_builder_filter.cpp", it appears there are still situations where "Nothing" can wreak havoc.

In the example shown in "Steps To Reproduce", the second document in collection c has an array field 'a' that contains two objects, where the first object has a field 'b' whose value is the empty array, and where the second object has an array field 'b' containing an object with a field 'c'. The command 'find({"a.b.c": 2})' should match the second document, but it doesn't because of an issue with the outermost TraverseStage and the TraverseStage at the second level of nesting.

Here is the SBE plan we currently generate for 'find({"a.b.c": 2})':

 

filter {s11} 
traverse s11 s10 s4 {s11 || s10} {s11} 
in 
    traverse s10 s9 s5 {s10 || s9} {s10} 
    in 
        project [s9 = fillEmpty (s8, false) || fillEmpty (isArray (s6), false) && fillEmpty (s6 == 2, false)] 
        traverse s8 s7 s6 {s8 || s7} {s8} 
        in 
            project [s7 = fillEmpty (s6 == 2, false)] 
            limit 1 
            coscan 
        from 
            project [s6 = getField (s5, "c")] 
            limit 1 
            coscan 
    from 
        project [s5 = getField (s4, "b")] 
        limit 1 
        coscan 
from 
    project [s4 = getField (s1, "a")] 
    scan s1 s2 [] @"f1259d4d-36c2-486b-a796-4a0df6ef1832" 

 

The "from" clause of the TraverseStage at the second level of nesting is 's5 = getField (s4, "b")'. When processing the second document in collection c, this TraverseStage gets invoked twice. For the first invocation, field "b"s value is the empty array, and so the TraverseStage's fold expression is never executed and the TraverseStage produces Nothing. For the second invocation, field "b"s value is an array containing an object with field "c" whose value is true, and so the TraverseStage produces Boolean true (because the innermost TraverseStage will visit field "c", compare 2==2, and produce Boolean true). When the outermost TraverseStage applies its fold expression, it ANDs together Nothing and Boolean True which produces Nothing, which causes the second document in collection c to not match (even though it should match).

Three possible ways to fix this that come to mind:
1) Change the fold expression passed to TraverseStage to use fillEmpty()
2) Change how TraverseStage's folding works. If TraverseStage gave us a way to specify an "initial" value, we could avoid producing Nothing in the empty array case. (For an example of folding with an "initial value", take a look at how Haskell's foldr function works: https://wiki.haskell.org/Data.Foldable.foldr ).
3) Wrap each TraverseStage with a ProjectStage that will convert Nothing to Boolean False.



 Comments   
Comment by Githook User [ 19/Nov/20 ]

Author:

{'name': 'Melodee Li', 'email': 'melodeeli98@gmail.com', 'username': 'melodeeli98'}

Message: SERVER-49943 [SBE] Fix edge case bug with 'find(

{a.b.c: ..}

)'
Branch: master
https://github.com/mongodb/mongo/commit/fdd81d023ce1888a04cfe98ceaf9fd728f3e21cd

Comment by Martin Neupauer [ 28/Jul/20 ]

There is no need to change Traverse. so #1 or #3.

Comment by David Storch [ 28/Jul/20 ]

I actually like proposed fix #2. martin.neupauer I'm curious if you have opinions on the right fix, and in particular what you think about extending the traverse stage to allow specifying an EExpression which is executed to obtain the initial value of the fold. In this particular case, the expression would be an EConstant containing a false value.

Generated at Thu Feb 08 05:21:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.