[SERVER-72644] Investigate variadic aggregate expression regressions Created: 09/Jan/23  Updated: 27/Mar/23  Resolved: 27/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Mihai Andrei Assignee: Anna Wawrzyniak
Resolution: Done Votes: 0
Labels: pm2697-m3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-75332 Fix variadic aggregate expression reg... Backlog
related to SERVER-70806 Investigate 'RunMultiply' regressions Closed
is related to SERVER-70806 Investigate 'RunMultiply' regressions Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Sprint: QE 2023-02-06, QE 2023-02-20, QE 2023-03-06, QE 2023-03-20, QE 2023-04-03
Participants:
Story Points: 2

 Description   
variadic_aggregate_expressions VariadicExpressionSetEqualsHundred.VariadicAggExpressionSetEqualsHundred -107.6554137
variadic_aggregate_expressions VariadicExpressionAddHundred.VariadicAggExpressionAddHundred -98.81069057
variadic_aggregate_expressions VariadicExpressionMultiplyHundred.VariadicAggExpressionMultiplyHundred -78.62192546
variadic_aggregate_expressions VariadicExpressionAddFifty.VariadicAggExpressionAddFifty -55.00343602
variadic_aggregate_expressions VariadicExpressionSetEqualsFifty.VariadicAggExpressionSetEqualsFifty -48.46890079
variadic_aggregate_expressions VariadicExpressionMultiplyFifty.VariadicAggExpressionMultiplyFifty -39.4838081 

(note that this is likely related to the RunMultiply regressions: SERVER-70806

We should also consider whether we expect these to perform well at all; these expressions use an arguably unrealistic number of arguments, but serve to demonstrate that, at the time of filing this ticket, aggregate expressions scale poorly in the number of arguments.



 Comments   
Comment by Mihai Andrei [ 27/Mar/23 ]

Closing as the investigation is complete; the work to fix this will be tracked in https://jira.mongodb.org/browse/SERVER-75332 

Comment by Anna Wawrzyniak [ 20/Mar/23 ]

Conclusions:

  • Using mql specific setEquals builtin that performs checks, improves perf for 50-100 arg case by  73-126%, that results in reduction of regression from -40% to -10% in 100 arg case, and 10% improvement for 50arg case compared to classic.
  • Using mql specific add builtin  that performs checks, improves perf for 50-100 arg case by 132-357%, that results in improvement of regression from -43% to improvement of 157% for 100arg case compared to classic
  • Using mql specific mul builtin that performs checks, improves perf for 10% for 50 arg case, but results in regression of -9% for 100 arg case. The 100arg case regression is explained by different overflow handling and promotion to Decimal. sbe currently builds balanced multiplication tree for 100+ args, which results in less decimal operations when compared to left-deep tree evaluation. The regression of -40% between sbe and classic is mostly explained by: 1) using decimal instead of double in case of overflow, 2) using interpreted checks rather than "builtin" checks.

 

Investigation branch:
https://github.com/10gen/mongo/compare/master...anna.wawrzyniak/SERVER-72644

All perf results:
https://docs.google.com/spreadsheets/d/1DQ8LprJYCt_DZxkhsnOtYoz-UOgKjPCagIvOArEt3Dc/edit#gid=0

Comment by Anna Wawrzyniak [ 16/Mar/23 ]

After investigating setEquals variations, it looks like most of the degradation is because we add

 if ((typeMatch(s4, 1088ll) ?: true) and if (!(isArray(s4)) || (!(isArray(s4)) || !(isArray(s5)))) \n        then fail(7158100, \"All operands of $setEquals must be arrays.\")

to our generated expressions.
Moving those checks into a builtin function, like mqlSetEquals that performs those checks internally and then calls core setEquals, resulted in >70-120% perf improvement for >50 arguments, reversing most of the regression.

  master mqlSetEquals improvement
VariadicExpressionSetEqualsFifty.Crud.ftdc 2731638466 1571666503 73.81%
VariadicExpressionSetEqualsFifty.VariadicAggExpressionSetEqualsFifty.ftdc 2731460122 1571471998 73.82%
VariadicExpressionSetEqualsHundred.Crud.ftdc 4281358874 1890906101 126.42%
VariadicExpressionSetEqualsHundred.VariadicAggExpressionSetEqualsHundred.ftdc 4281147969 1890675577 126.43%
VariadicExpressionSetEqualsTen.Crud.ftdc 1473774785 1268050867 16.22%
VariadicExpressionSetEqualsTen.VariadicAggExpressionSetEqualsTen.ftdc 1473626448 1267878303 16.23%

example query:

db.t2.aggregate([{$project: {setEquals: {$setEquals: ["$a", "$a", "$b"]}}}])

plan before:

[2] project [s8 = traverseP(s6, lambda(l1.0) { makeBsonObj(MakeObjSpec(keep, ["_id"], ["setEquals"]), l1.0, 
    if ((typeMatch(s4, 1088ll) ?: true) || ((typeMatch(s4, 1088ll) ?: true) || (typeMatch(s5, 1088ll) ?: true))) 
    then null 
    else 
        if (!(isArray(s4)) || (!(isArray(s4)) || !(isArray(s5)))) 
        then fail(7158100, "All operands of $setEquals must be arrays.") 
        else setEquals(s4, s4, s5) 
) }, Nothing)] 
[1] scan s6 s7 none none none none [s4 = a, s5 = b] @"f541710e-34a4-4701-8902-5c68316ea153" true false 

plan after:

[2] project [s8 = traverseP(s6, lambda(l1.0) { makeBsonObj(MakeObjSpec(keep, ["_id"], ["setEquals"]), l1.0, mql_setEquals(s4, s4, s5)) }, Nothing)] 
[1] scan s6 s7 none none none none [s4 = a, s5 = b] @"f541710e-34a4-4701-8902-5c68316ea153" true false 

Generated at Thu Feb 08 06:22:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.