[SERVER-66310] Make ExpressionSetUnion::isCommutative() collation aware Created: 09/May/22 Updated: 29/Oct/23 Resolved: 27/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.13, 5.0.7, 4.2.20, 6.0.0-rc5, 6.1.0-rc0 |
| Fix Version/s: | 4.2.23, 4.4.17, 6.0.1, 5.0.11, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Timour Katchaounov | Assignee: | James Wahlin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v6.0, v5.0, v4.4, v4.2
|
||||||||||||||||
| Steps To Reproduce: | To reproduce run the following:
|
||||||||||||||||
| Sprint: | QO 2022-05-16, QO 2022-05-30 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||
| Description |
|
The $setUnion aggregation expression is currently defined to be always commutative. This breaks down a collation is in place that can compare 2 different binary values as being the same. We should consider making ExpressionSetUnion::isCommutative() return false when a non-simple collation is in place. |
| Comments |
| Comment by Githook User [ 11/Aug/22 ] | |||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: | |||||||||||||||||||
| Comment by Alya Berciu [ 11/Aug/22 ] | |||||||||||||||||||
|
The backport to 5.0 was completed a while ago, but for some reason there was no comment added by the bot (commit). | |||||||||||||||||||
| Comment by Githook User [ 11/Aug/22 ] | |||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: | |||||||||||||||||||
| Comment by Githook User [ 21/Jul/22 ] | |||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: (cherry picked from commit 2c53b7b684c8dd90044b8ef19932453088f54869) | |||||||||||||||||||
| Comment by Githook User [ 26/May/22 ] | |||||||||||||||||||
|
Author: {'name': 'James Wahlin', 'email': 'james@mongodb.com', 'username': 'jameswahlin'}Message: | |||||||||||||||||||
| Comment by James Wahlin [ 16/May/22 ] | |||||||||||||||||||
|
This turned out to be expected behavior and the difference is due to order of insertion to a set with a non-simple collation. The following demonstrates root of the problem:
This produces:
The reason that "a" and "b" contain different values is that the collation (default strength 3) compares empty string and the unicode string "\u0001" (a control character) to be the same. The first value inserted into the set is the one that wins. For the original reproducer, it looks like optimizing the pipeline changes the order of set insertion. The fix will likely be to change ExpressionSetUnion::isCommutative() to return false when a non-simple collation is in place. | |||||||||||||||||||
| Comment by James Wahlin [ 16/May/22 ] | |||||||||||||||||||
|
This reproduces under normal collections at least as far back as 4.2, which is when we introduced theĀ | |||||||||||||||||||
| Comment by James Wahlin [ 12/May/22 ] | |||||||||||||||||||
|
It appears that the ordering of the elements in the $setUnion can impact the result on 6.0+. If you move the $reduce element first on the $setUnion array then both optimized and not-optimized pipelines produce the same result. |