[SERVER-46689] $bucketAuto should optimize its expressions Created: 06/Mar/20  Updated: 06/Dec/22  Resolved: 19/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-47264 Backport DocumentSourceBucketAuto::op... Closed
Assigned Teams:
Query
Participants:

 Description   

In SERVER-45447 we found that $bucketAuto does not override the optimize() method, which means expressions inside a $bucketAuto were not being constant-folded.  We've fixed this in 4.4.

This has caused us to lose some test coverage, because queries using $bucketAuto in 4.2 vs 4.4 can return different results.  A lot of the cases involve one query erroring and the other not, but some cases may involve two slightly different results.

Giving $bucketAuto an optimize() method on 4.2 would be a small code change, would make some queries faster, and would make our 4.4 tests greener.  But it could also cause subtle behavior changes.



 Comments   
Comment by David Percy [ 13/Mar/20 ]

optimize(), as currently implemented, can change the results of a query, because some operators like $add and $multiply are marked as associative even though they are not associative.

$add is not associative because floating point addition is not associative.  An expression like 1 + 1e99 + -1e99 is parsed as d((1 + 1e99) + -1e99, and evaluates to 0 because of rounding error. Re-associating it to 1 + (1e99 + -1e99) removes the rounding error, and changes the result to 1.

When optimize() sees an $add, it reorders the arguments to put the constant arguments together, so it can constant-fold them. You can observe this by changing a constant to a variable using $let:

> db.coll.aggregate([ {$project: {_id: {$add: [1, 1e99, -1e99]}}} ])
{ "_id" : 0 }
> db.coll.aggregate([ {$project: {_id: {$let: {vars: {one: 1.0}, in: {$add: ["$$one", 1e99, -1e99]}}} }} ])
{ "_id" : 1 }

In the first example, the $add gets optimized to {{ {$const: 0} }} because all its arguments are constant, but in the second example, it gets optimized to {{ {$add: ["$$one", 0]} }}.

Comment by David Storch [ 13/Mar/20 ]

david.percy it makes sense to me that this difference between 4.2 and 4.4 could cause one query erroring and the other not in the agg multiversion fuzzer. But how can it cause two slightly different results? If the result sets are indeed different, then it seems like there is a bug in the server. Alternatively, perhaps the result sets should be considered equal by the logic in the fuzzer?

Generated at Thu Feb 08 05:12:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.