[SERVER-46689] $bucketAuto should optimize its expressions Created: 06/Mar/20 Updated: 06/Dec/22 Resolved: 19/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Percy | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query
|
||||||||
| Participants: | |||||||||
| Description |
|
In This has caused us to lose some test coverage, because queries using $bucketAuto in 4.2 vs 4.4 can return different results. A lot of the cases involve one query erroring and the other not, but some cases may involve two slightly different results. Giving $bucketAuto an optimize() method on 4.2 would be a small code change, would make some queries faster, and would make our 4.4 tests greener. But it could also cause subtle behavior changes. |
| Comments |
| Comment by David Percy [ 13/Mar/20 ] | ||||
|
optimize(), as currently implemented, can change the results of a query, because some operators like $add and $multiply are marked as associative even though they are not associative. $add is not associative because floating point addition is not associative. An expression like 1 + 1e99 + -1e99 is parsed as d((1 + 1e99) + -1e99, and evaluates to 0 because of rounding error. Re-associating it to 1 + (1e99 + -1e99) removes the rounding error, and changes the result to 1. When optimize() sees an $add, it reorders the arguments to put the constant arguments together, so it can constant-fold them. You can observe this by changing a constant to a variable using $let:
In the first example, the $add gets optimized to {{ {$const: 0} }} because all its arguments are constant, but in the second example, it gets optimized to {{ {$add: ["$$one", 0]} }}. | ||||
| Comment by David Storch [ 13/Mar/20 ] | ||||
|
david.percy it makes sense to me that this difference between 4.2 and 4.4 could cause one query erroring and the other not in the agg multiversion fuzzer. But how can it cause two slightly different results? If the result sets are indeed different, then it seems like there is a bug in the server. Alternatively, perhaps the result sets should be considered equal by the logic in the fuzzer? |