[DOCS-15241] [SERVER] Constant folding not commutative Created: 13/Apr/22  Updated: 13/Nov/23  Resolved: 08/Feb/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 6.1.0-rc0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Dave Cuthbert (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-63099 Constant folding should not assume ar... Closed
Participants:
Days since reply: 1 year, 30 weeks, 2 days ago
Epic Link: DOCSP-11701

 Description   

Original title: [SERVER] Investigate changes in SERVER-63099: Constant folding should not assume arithmetic is associative or commutative in the general case

------

Original Downstream Change Summary

Users with queries that multiply or add many constants together along with some fieldpaths in an aggregation pipeline using the $add and $multiply expressions may find that their queries become slower after SERVER-63099 if their fieldpaths are at the beginning of the argument list. See JIRA comments on this ticket for examples.

After SERVER-63099 is merged, users will be able to speed up their queries back to pre-fix performance by moving any fieldpath references to the end of the list of arguments. For example, $multiply: [1, "$v", 2, "$w", 3, 4] can be rewritten to $multiply: [1, 2, 3, 4, "$v", "$w"] to speed the query up again.

The fact that addition and multiplication with these expressions is now defined to be left-to-right associative should be reflected in the documentation for $add and $multiply.

Description of Linked Ticket

Currently, arithmetic aggregation expressions like $multiply are marked as associative and commutative which allows the constant folding optimization to rearrange constants to combine as many of them as possible during optimization.

There are three reasons this is not a valid optimization in general:

1. For integers, this is not a safe optimization due to overflow. For example, 1073741824*-1*2 == -2147483648, but 1073741824*2*-1 overflows.

2. Floating point arithmetic is not generally associative. See this paper, specifically:

Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the expression (x+y)z has a totally different answer than x(y+z) when x = 1030, y = -1030 and z = 1 (it is 1 in the former case, 0 in the latter). The importance of preserving parentheses cannot be overemphasized.

3. Type coercions occur when performing an operation on multiple types. For example, 1 + 0.5 + 2 will convert 1 from an integer to a float before adding it to 0.5. These type coercions are dependent on the order of operations and could change the result. See the "steps to reproduce" section for an example with `NumberDecimal`.

Because we can't know if a fieldpath will resolve to a specific number type or a value within a specific range, we cannot rule out any kind of overflow or type coercion.

So far I have only looked in to ExpressionMultiply, but ExpressionAdd and other arithmetic operations should be investigated to make sure they respect type coercions and overflow when constant folding.



 Comments   
Comment by Education Bot [ 11/Jul/22 ]

Fix Version updated for upstream SERVER-63099:
6.1.0-rc0

Generated at Thu Feb 08 08:12:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.