[SERVER-58100] sbe_cmd.js does not check that queries return expected results, and the queries are returning the wrong results Created: 25/Jun/21  Updated: 27/Oct/23  Resolved: 21/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Execution
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Sprint: Query Execution 2021-07-26, QE 2021-08-09, QE 2021-08-23, QE 2021-09-06, QE 2021-09-20, QE 2021-10-04, QE 2021-10-18
Participants:

 Description   

The sbe_cmd.js test is our one test for checking that the test-only SBE command does something reasonable. This line in the test is verifying that the command runs correctly, but not that it returns the correct results. Also, this line is written incorrectly, since assert(db._sbe(slotBasedPlan)); is checking that the resulting DBCommandCursor is not undefined, which should always be true. Instead, we should call .itcount() or .toArray() on the cursor and check that the returned results are correct.

I tried making this change, and discovered that the returned results are in fact not correct! It appears that the SBE plan returned by explain does not correctly roundtrip and result in the exact same execution plan. It is unclear how much effort we want to spend on the SBE test command in general, but we could consider looking into why exactly the SBE string returned by explain does not return the correct results when run via the SBE command.



 Comments   
Comment by David Storch [ 21/Mar/23 ]

The sbe_cmd.js test is gone because we deleted the SBE command in SERVER-65699. Closing as "Gone Away".

Comment by Ethan Zhang (Inactive) [ 06/Oct/21 ]

We discussed this today during the triage meeting and are inclined to backlog this. Will wait for another week to make a final decision after Kyle comes back from his vacation.

Comment by Kyle Suarez [ 27/Sep/21 ]

After a lot of investigation by Andrii, we're having a hard time getting to the bottom of this issue. However, since the sbe command is test-only, it should not affect the status of SBE as the default execution engine. I am going to remove this from the SBE Correctness epic and we will find time to take a look at this later.

Comment by David Storch [ 12/Jul/21 ]

andrii.dobroshynski, ah, this makes a lot of sense! Thanks for the explanation. It would be nice if you could roundtrip the SBE string produced by explain through the SBE command and get the same results. However, this would most likely require changing the SBE command to accept both the "slots" and "stages" components of the explain output, e.g. all of this stuff:

MongoDB Enterprise > db.c.find().explain().queryPlanner.winningPlan.slotBasedPlan
{
	"slots" : "$$RESULT=s3 $$RID=s4 env: { s1 = TimeZoneDatabase(America/Grand_Turk...Asia/Pontianak) (timeZoneDB), s2 = 1626125044323 (NOW) }",
	"stages" : "[1] scan s3 s4 none none none none [] @\"5cd4bdd4-3d36-401b-b8d2-4410ebb25a71\" true false "
}

The "slots" field encodes the information that the $$RESULT slot is s3. Making this improvement is out of scope for this ticket, since there's no sense spending too much time improving the test-only SBE command right now. I think we should just fix the test by making it call itcount() and leave any further improvements as future work. It would be good to leave a comment explaining that the SBE command cursor is expected to return empty documents, because we currently don't have an easy way to programmatically instruct the SBE command about which slot is the $$RESULT slot.

Comment by Andrii Dobroshynski (Inactive) [ 09/Jul/21 ]

As a more concrete example of what I'm talking about here, here is the behavior that I can currently observe on master – Assuming we have a collection with two documents

{ "_id" : 0, "val" : 1 }
{ "_id" : 1, "val" : 1 } 

The slotBasedPlan SBE plan strings looks something like this

"[1] scan s3 s4 none none none none [] @\"671c09e5-abc2-435c-91ae-d6de4d43a100\" true false " 

Then running this exact plan string through the internal `db._sbe` command I currently get this result

> db._sbe("[1] scan s3 s4 none none none none [] @\"671c09e5-abc2-435c-91ae-d6de4d43a100\" true false ")
{  }
{  }

so two results as expected for the two documents but not populated. But, if we change the s3 slot to $$RESULT and re-run via the SBE command, then we get the expected results

> db._sbe("[1] scan $$RESULT s4 none none none none [] @\"671c09e5-abc2-435c-91ae-d6de4d43a100\" true false ")
{ "_id" : 0, "val" : 1 }
{ "_id" : 1, "val" : 1 } 

We decided the amount of effort to implement this result slot stuff wouldn't be worth it, therefore we can't really compare the arrays to each other, but I guess as a somewhat compromise we could just do `itcount()` like you mentioned and compare the array size? What do you think?

Comment by Andrii Dobroshynski (Inactive) [ 09/Jul/21 ]

david.storch I recall this coming up when we were adding test coverage to the SBE command, and one of the things that I found was that if we execute an SBE plan string via this command, we get back empty document results. Anton pointed out that this is because we don't have a way to specify the result and record id slots like `$$RESULT` therefore the internal command doesn't know where to put the results. I wonder, is this what you found happening or is there something else going on? Because of the stuff with result slots, we would get empty results back so we decided to just assert that the call succeeds, which would tell us that the string plan is parsed OK and executed without issues, which I guess is most of what we would care about with the SBE command at the time.

Generated at Thu Feb 08 05:43:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.