[SERVER-59889] Add compatibility tests for $group pushdown behavior Created: 10/Sep/21  Updated: 18/Jan/22  Resolved: 18/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Eric Cox (Inactive) Assignee: Yoon Soo Kim
Resolution: Won't Fix Votes: 0
Labels: sbe
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-60030 Wrong SBE PlanStage tree is generated... Closed
Sprint: QE 2021-09-20
Participants:

 Description   

This work adds a JS test suite that compares the behavior between execution of $group in the classic engine vs $group pushdown into sbe. We could consider a randomized testing approach for data generation, not for queries. Randomized queries makes it too hard to investigate a failure.

The main goal here is to expose incompatible behaviors between the behaviors of $group in classic engine and ones of $group in SBE as soon as possible and make them easy to investigate.



 Comments   
Comment by Yoon Soo Kim [ 18/Jan/22 ]

I'd rather wait for how the test working group discussion goes. I believe that this ticket is closely related to the discussion because what I was trying to do was to define a combinatorial test framework and on top of that, define test matrix for $group. Will resolve this as "Won't Fix".

But please, feel free to reopen this ticket when the necessity arises.

Comment by Yoon Soo Kim [ 22/Sep/21 ]

Sure ethan.zhang, will add it.

Comment by Ethan Zhang (Inactive) [ 22/Sep/21 ]

yoonsoo.kim

  1. no missing value vs missing value(s) vs invalid value(s) for a type/field

Should we add "undefined" to this as well? 

Comment by Ethan Zhang (Inactive) [ 21/Sep/21 ]

Just discussed this in standup. The combinations are deterministic, but the test data is not. Yoonsoo will make changes to make it deterministic.

Comment by Yoon Soo Kim [ 21/Sep/21 ]

Sure, steve.la. I think basically there are 3 groups of tests for $group.

  1. group-by key(s) tests: jstests/aggregation/sources/group/*.js
  2. accumulator tests: jstests/aggregation/accumulators/*.js
  3. specific tests for bugs or behaviors: many tests in jstests/aggregation/bugs directory are for $group and accumulators but not all of them.

Yes, existing $group test suites eventually validate expected results for test cases that we are explicitly testing if we run those tests with SBE turned on. The questions are whether they are enough and it would be easy to investigate test failures.

I think we have the same kind of test suites for find() but AFAIK, we've had a long tail of correctness issues for SBE find() many of which have been found by fuzzer tests. Failures with the fuzzer test are notorious for being hard to investigate among query team. I believe that the one reason why we have a long tail for SBE correctness for find() is that fuzzer tests are inherently randomized tests for queries. So, it may take a long time to reveal all possible correctness issues and also it's very time-consuming to make a minimal repro out of a generated failed fuzzer test file, which is a starting point of fixing an issue.

This ticket is an attempt to expose such incompatibilities as early as possible by generating full $group spec combinations out of various test dimensions and running those $group specs on both the classic engine and the SBE engine and compare the SBE engine result to the classic engine result. This ticket is also an attempt to help engineers to investigate test failures by attaching a repro script for each test failure. Usually a js test stops running as soon as the first failed test case is encountered but this test suite will keep running until all test combinations of $group spec are completed and report all failed test cases with repro scripts at the end of execution.

While I have been working on this ticket, I found two incompatibilities though it's still in the early stage of development. One is related to an error code for a case that a numeric expression like $add fails with non-numeric value at runtime and the other is $sum result for SBE is different from one for classic engine for a certain input data. I think the first one is ok to include to known incompatibilities and the second one is worth investigating further.

I'm not sure whether this explanation makes sense to you and you think this test suite would be doable and useful. Please, let me know.

If we think that existing $group test suites and agg fuzzer tests are enough, I can move on to work on other items like improving $group performance. 

Comment by Yoon Soo Kim [ 16/Sep/21 ]

Found a bug.

Comment by Yoon Soo Kim [ 15/Sep/21 ]

Test dimensions to consider are:

  1. no accumulator vs one accumulator vs multiple accumulators
  2. null _id vs scalar _id vs document _id with one field vs document _id with multiple fields
  3. a simple field path vs a complex field path vs expression for each accumulator
  4. no missing value vs missing value(s) vs invalid value(s) for a type/field vs undefined for a field
  5. data types: int vs long vs decimal vs small string vs long string vs date vs timestamp vs array vs mixed-type array vs boundary values (MIN/MAX per type, NaN, infinity) vs mixed types
  6. default collator vs explicit collator

For dimension 1 & 2 & 3, we could generate $group stages. This list should cover as many combinations as possible, avoiding redundant test cases. For 4 & 5 & 6, we can leverage random data generation and still full combinations should be covered.

In the context of this test suite,

  • not being supported in SBE is NOT an error.
  • we should report an easy repro step (or steps) with generated data, the classic engine's result, the SBE's result, the winning plan, SBE PlanStage tree, and $group stage when a failure happens and the test suite should make it as easy as possible to investigate a test failure.
  • The classic engine's result is always considered as "correct expected behavior" and this test suite can't verify whether the classic engine's behavior is correct or not.
  • if a SBE result is different from the classic engine's result in any way, it's an error.
  • Whenever we decide that a difference is an intended behavior, we add a known difference as a custom compare function so that the comparison can succeed.
  • This test suite only verifies $group pipeline stage behavior.
  • This test suite does not test whether $group is pushed down or not.
Generated at Thu Feb 08 05:48:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.