[SERVER-63018] Have ExpressionAdd::evaluate() return intermediate result for pipeline optimization constant folding Created: 26/Jan/22  Updated: 27/Oct/23  Resolved: 27/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.2.0, 4.2.17, 5.0.5, 5.1.1, 4.4.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yoon Soo Kim Assignee: Davis Haupt (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-63099 Constant folding should not assume ar... Closed
is related to SERVER-65735 $add operator should not use DoubleDo... Closed
Operating System: ALL
Steps To Reproduce:

Starts mongod with --setParameter enableTestCommands=1

// Inserts any decimal value
db.opt.insert({n: NumberDecimal("0")})
 
// Turns off pipeline optimization
db.adminCommand({configureFailPoint: "disablePipelineOptimization", mode: "alwaysOn"});
 
// In mixed type $add, if there's any decimal input ($n), the final result is promoted to decimal.
// While doing addition, the double double sum algorithm accumulates non-decimal values into a separate nonDecimalTotal state.
// So, we don't lose precision here.
// Note that int64 can't be exactly expressed.
db.opt.aggregate({$project: {_id: 0, o: {$add: ["$n", 0, NumberLong("111111111111111111")]}}});
{ "o" : NumberDecimal("111111111111111111") }
 
// Enables pipeline optimization
db.adminCommand({configureFailPoint: "disablePipelineOptimization", mode: "off"});
 
// Here an interesting thing happens when a pipeline is optimized. This is the SBE result.
// While optimizing the pipeline, double 0 and NumberLong("111111111111111111") is optimized into a single double value.
// And the SBE binary $add algorithm promotes the double to a decimal with 15 digit precision and so we lose 3 digit precision here.
// This issue happens because the optimization passes the final result instead of the intermediate results in the form of
// decimalTotal and nonDecimalTotal.
db.adminCommand({getParameter: 1, internalQueryForceClassicEngine: 1});
{ "internalQueryForceClassicEngine" : false, "ok" : 1 }
db.opt.aggregate({$project: {_id: 0, o: {$add: ["$n", 0, NumberLong("111111111111111111")]}}});
{ "o" : NumberDecimal("111111111111111000") }
 
// This is the classic engine's result.
// The classic engine's double double sum algorithm promotes the optimization result double value to a decimal with 34 digit 
// precision. So, we keep the double value as close as possible to the addition result which is 111111111111111104.0.
// But the real issue here is that the optimization passes the final result instead of the intermediate results in the form of
// decimalTotal and nonDecimalTotal.
db.adminCommand({setParameter: 1, internalQueryForceClassicEngine: true});
{ "was" : false, "ok" : 1 }
db.opt.aggregate({$project: {_id: 0, o: {$add: ["$n", 0, NumberLong("111111111111111111")]}}});
{ "o" : NumberDecimal("111111111111111104") }

Participants:

 Description   

The proposed fix is to make ExpressionAdd::evaluate() return intermediate result in the form of decimalTotal and nonDecimalTotal for pipeline optimization constant folding. For example,

{decimalTotal: val1, nonDecimalTotal: val2}



 Comments   
Comment by Davis Haupt (Inactive) [ 27/Jun/22 ]

After committing SERVER-65735, the implementation for ExpressionAdd no longer keeps multiple running totals to compute a sum. Rather, only one total is used at a given time and all arithmetic happens as decimal arithmetic after the first decimal is encountered to preserve the input order of operations.

Comment by Davis Haupt (Inactive) [ 14/Feb/22 ]

On the immediate issue of resolving the BF that spawned this ticket, the difference between optimized and non-optimized pipelines will be addressed by SERVER-63099 which is in review currently. I'll move the BF to link to that other ticket.

The premise of this ticket is that the more precise intermediate value produced by $add should also be considered the more correct value. However, this seems to be an active area of discussion around type conversions and type promotions that is being addressed in the spec tracked by WRITING-10039, so I'm moving this ticket to blocked for now.

I also think there's a good chance that passing along the intermediate value would require a more extensive refactor of the constant folding logic. Right now, ExpressionNary::evaluate() is called during optimization as well as during the execution of the pipeline. We don't want the functionality during execution to change, just optimization, and so we probably would need a separate method that would expose the intermediate value rather than changing the evaluate() method.

Comment by Rushan Chen [ 11/Feb/22 ]

davis.haupt assigning this to you as you are investigating the solution for related issue. 

Comment by Rushan Chen [ 01/Feb/22 ]

Leaving this in the queue to be included as part of decimal precision semantics improvement.

Generated at Thu Feb 08 05:56:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.