[SERVER-49723] [SBE] Using find() to query values in array field is broken in some cases Created: 20/Jul/20  Updated: 29/Oct/23  Resolved: 22/Jul/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Drew Paroski Assignee: Drew Paroski
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-49721 [SBE] Using dot notation in find() to... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

> db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: false})
{ "was" : false, "ok" : 1 }
> db.c.find({}, {_id: 0})
{ "a" : [ 123, "foo" ] }
{ "a" : [ "foo", 123 ] }
> db.c.find({a: "foo"}, {_id: 0})
{ "a" : [ 123, "foo" ] }

{ "a" : [ "foo", 123 ] }> db.adminCommand({setParameter: 1, internalQueryEnableSlotBasedExecutionEngine: true})
{ "was" : false, "ok" : 1 }
> db.c.find({a: "foo"}, {_id: 0})
{ "a" : [ "foo", 123 ] }

Sprint: Query 2020-07-27
Participants:

 Description   

This issue is pretty similar to SERVER-49721.

When looking at possible ways to refactor some of the logic in "sbe_stage_builder_filter.cpp", I tested out some cases that involved using find() to query the contents of an array field, and I noticed it was broken in some cases.

Given a document with a field "a" that contains an array with two values of different types, it appears impossible for the operator to ever match against the second element in the array. See the example in the "Steps to Reproduce" section for a specific example.

From my initial investigation, it seems this is happening because of the specific "fold" expression that is being passed into TraverseStage in "sbe_stage_builder_filter.cpp" and how this expressions behaves when it is translated to bytecode and executed in the VM.

The "fold" expression being pased into TraverseStage is "logicOr(traversePredicateVar, elemPredicateVar)". I looked at the bytecode that is generated for this expression and how it behaved. First, it is important to note that there is no implicit "cast to bool" that is happening to the arguments being passed to logicOr(), but rather the values of the arguments are being directly fed to the logicOr(). The bytecode generated doesn't do quite what I would have expected. Here is a summary of what the bytecode essentially does:
1) If LHS is Nothing or Boolean True, then logicOr() returns LHS.
2) If LHS is not Nothing or Boolean True, then logicOr() returns RHS.

Now let's consider the example in "Steps to Reproduce" and look at why db.c.find({"a":"foo"}) doesn't match the first document in c even though it should. When TraverseStage applies the projection (== "foo") to the first element of the array in field "a", the == operator returns Nothing (see the implementation of the eq instruction in "vm.cpp" to see why). Because this is the first element of the array, Traverse doesn't evaluate the fold expression - instead it just stores Nothing directly in the "_outField" slot. Next, TraverseStage applies the projection (== "foo") to the second element of the array, and == operator returns Boolean True. TraverseStage then evaluates the fold expression, which in this case is 'logicOr(Nothing,True)'. Because of the behavior of logicOr() that I described above, 'logicOr(Nothing,True)' returns Nothing. As a result, the first document in c doesn't pass the filter even though it should.

Given this, in order to fix this bug I think we need to do one or more of the following:
1) Change the fold expression we pass to TraverseStage in "sbe_stage_builder_filter.cpp".
2) Change logicOr() to implicitly coerce both of its arguments to boolean first.
3) Change logicOr() to behave differently when one or both of its arguments is Nothing.



 Comments   
Comment by Githook User [ 22/Jul/20 ]

Author:

{'name': 'Drew Paroski', 'email': 'drew.paroski@mongodb.com', 'username': 'paroski'}

Message: SERVER-49723 [SBE] Using find() to query values in array field is broken in some cases
Branch: master
https://github.com/mongodb/mongo/commit/19db8ebda506ddcd3f4e477fb9bd4228867e6ca3

Generated at Thu Feb 08 05:20:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.