[SERVER-63099] Constant folding should not assume arithmetic is associative or commutative in the general case Created: 28/Jan/22 Updated: 29/Oct/23 Resolved: 11/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Davis Haupt (Inactive) | Assignee: | Davis Haupt (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v6.0
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | QO 2022-02-07, QO 2022-02-21 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 149 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently, arithmetic aggregation expressions like $multiply are marked as associative and commutative which allows the constant folding optimization to rearrange constants to combine as many of them as possible during optimization. There are three reasons this is not a valid optimization in general: 1. For integers, this is not a safe optimization due to overflow. For example, 1073741824*-1*2 == -2147483648, but 1073741824*2*-1 overflows. 2. Floating point arithmetic is not generally associative. See this paper, specifically:
3. Type coercions occur when performing an operation on multiple types. For example, 1 + 0.5 + 2 will convert 1 from an integer to a float before adding it to 0.5. These type coercions are dependent on the order of operations and could change the result. See the "steps to reproduce" section for an example with `NumberDecimal`. Because we can't know if a fieldpath will resolve to a specific number type or a value within a specific range, we cannot rule out any kind of overflow or type coercion. After investigation, this only applies to ExpressionAdd and ExpressionMultiply. As part of the fix for this ticket, we'll implement a left-to-right constant folding optimization that respects the order of operations during query execution while folding as many constants as is safe. For example, [c1, c2, c3, "$v", c5, c6] will be folded to [c', "$v", c5, c6]. |
| Comments |
| Comment by Liubov Molchanova [ 26/Jul/22 ] | ||||||||||||||||||||
|
Requesting Backport as the issue reproduced in BFG-1274926 on v6.0. | ||||||||||||||||||||
| Comment by Githook User [ 11/Jul/22 ] | ||||||||||||||||||||
|
Author: {'name': 'Davis Haupt', 'email': 'davis.haupt@mongodb.com', 'username': 'davish'}Message: | ||||||||||||||||||||
| Comment by Githook User [ 23/Jun/22 ] | ||||||||||||||||||||
|
Author: {'name': 'Davis Haupt', 'email': 'davis.haupt@mongodb.com', 'username': 'davish'}Message: Merge remote-tracking branch 'origin/master' into davish/ | ||||||||||||||||||||
| Comment by Githook User [ 23/Jun/22 ] | ||||||||||||||||||||
|
Author: {'name': 'Davis Haupt', 'email': 'davis.haupt@mongodb.com', 'username': 'davish'}Message: Merge remote-tracking branch 'origin/master' into davish/ | ||||||||||||||||||||
| Comment by Davis Haupt (Inactive) [ 10/May/22 ] | ||||||||||||||||||||
|
Satisfactory integration tests for this PR are blocked on | ||||||||||||||||||||
| Comment by Davis Haupt (Inactive) [ 03/Mar/22 ] | ||||||||||||||||||||
|
david.percy I agree that that's a valid way to interpret it, but to give the other interpretation, the distinction that geert.bosch and anna.wawrzyniak made in the type promotion discussion last month was that users provide arguments to expressions like $add and $multiply in an explicit ordering. Accumulators like $sum, on the other hand, operate over a stream of documents that the user isn't explicitly providing in any order, and so it's much less unexpected that the optimizer could choose the ordering of the stream of documents based on an index, for example. | ||||||||||||||||||||
| Comment by Timour Katchaounov [ 16/Feb/22 ] | ||||||||||||||||||||
|
The above benchmark clearly shows that in some cases (that can be quite common) there will be a ~20% performance regression. This is very substantial, and I believe we cannot push such a change. | ||||||||||||||||||||
| Comment by Davis Haupt (Inactive) [ 11/Feb/22 ] | ||||||||||||||||||||
|
The new Genny workload in PERF-2796 seems to 1) measure the impact of constant folding well and 2) show that adding a left-associative constant folding optimization can bring performance back up to the level seen on master currently if users manually re-arrange their operands to put all fieldpaths at the end of the operand list.
Note that the following results are from a single run of each configuration and so some variance is expected, but the orders of magnitude are what is significant. All numbers are in nanoseconds.
| ||||||||||||||||||||
| Comment by Davis Haupt (Inactive) [ 10/Feb/22 ] | ||||||||||||||||||||
|
Opened PERF-2796 (with a PR here: https://github.com/mongodb/genny/pull/617) to create a new workload to measure constant folding speedups and get some baseline metrics on performance before fixing correctness. I'm also planning on adding a new DSI configuration variant with optimization disabled to compare metrics with the additional left-associative constant folding logic that's to be added. | ||||||||||||||||||||
| Comment by Davis Haupt (Inactive) [ 28/Jan/22 ] | ||||||||||||||||||||
|
The simplest fix would be to change isAssociative() and isCommutative() for multiplication and addition. But there should be some discussion on whether there's a way to enable these optimizations in special cases where it's known to not cause overflow or trigger a type conversion. |