[SERVER-76060] sbe::value::serializeValue/deserializeValue omits ArraySet collator Created: 13/Apr/23 Updated: 30/Oct/23 Resolved: 10/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ivan Fefer | Assignee: | Rui Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Query Execution
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v7.0, v6.0
|
||||
| Sprint: | QE 2023-05-29, QE 2023-06-26, QE 2023-07-10, QE 2023-07-24 | ||||
| Participants: | |||||
| Description |
|
When we serialize ArraySet on src/mongo/db/exec/sbe/values/row.cpp#L276 we don't include any information about collation. So when we deserialize it on L118 we create new ArraySet without collation, which can potentially lead to problems. |
| Comments |
| Comment by Githook User [ 27/Oct/23 ] |
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message:
(cherry picked from commit a19074b842b752ee0a61810e0b8f6d79c5aa80c1)
(cherry picked from commit d0811e844e8566dc276fcd73fceabec71c0e2717) |
| Comment by Githook User [ 27/Oct/23 ] |
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message:
(cherry picked from commit a19074b842b752ee0a61810e0b8f6d79c5aa80c1)
(cherry picked from commit d0811e844e8566dc276fcd73fceabec71c0e2717) |
| Comment by Githook User [ 10/Jul/23 ] |
|
Author: {'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}Message: |
| Comment by David Storch [ 01/Jun/23 ] |
|
As discussed over Slack, rui.liu@mongodb.com is going to write a script to prove that there is a real bug here. Something like $group with an $addToSet that has to spill and where the query uses a case-insensitive collation. In this case, after merging spilled partial aggregates, the $addToSet value could end up with strings that only differ by their case, meaning that the case-insensitive collation was not respected. |
| Comment by Rui Liu [ 12/May/23 ] |
|
rushan.chen@mongodb.com Sorry missed the comment. Nope, the accumulator we plan to implement will not use those data structures and the collator is not saved in the state. |
| Comment by Ivan Fefer [ 19/Apr/23 ] |
|
So the answer is "it depends". Previous similar bug Here we are dealing with general serialization/deserialization function, which doesn't return exactly the same object after serialization/deserialization. So imagine SBE code that does the following:
However, this actually doesn't happen: to my knowledge there is no valid MQL query that will produce something like that. So to conclude, current version works and it performs better than if we support collation, but it is not "future-proof". Maybe we can "future-proof" it by explicitly converting ArraySet to Array when serializing, so this behavior would be expected and documented in code and unit tests? |
| Comment by Kyle Suarez [ 18/Apr/23 ] |
|
After discussion in the triage meeting, we weren't sure this was a bug. david.storch@mongodb.com thought that if we omit a collator then we would no longer use the hash table and instead you would need to walk the data structure linearly? ivan.fefer@mongodb.com, is this a correctness issue or are you thinking of this as a performance improvement? |