[SERVER-82197] Incorrect query results in SBE if $group spills in presence of collation Created: 13/Oct/23  Updated: 24/Jan/24  Resolved: 08/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.1, 7.3.0-rc0, 7.0.5, 6.0.13

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Foteini Alvanaki
Resolution: Fixed Votes: 0
Labels: auto-reverted, bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
duplicates SERVER-81390 HashAggStage fails to respect the col... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.2, v7.0, v6.0
Steps To Reproduce:

1. set internalQuerySlotBasedExecutionHashAggForceIncreasedSpilling to true
2. db.createCollection("lp", {collation: {locale: 'en_US', strength: 2}}) // case-insensitive collation
3. db.lp.insertMany([{"_id":0, key:"A"},{"_id":1, key:"A"},{"_id":2, key:"B"},{"_id":3, key:"B"},{"_id":4, key:"a"},{"_id":5, key:"a"}])
4. db.lp.aggregate({$group: {_id: "$key"}})
 
The results contain groups for "A" and "a" even though these should compare equal.
{ "_id" : "A" }
{ "_id" : "B" }
{ "_id" : "a" }

Participants:
Linked BF Score: 35

 Description   

When processing the inputs from the child in HashAggStage, it's possible that equivalent per the collation but not identical keys would come in separated by another key (like "A" and "a" in the repro are separated by "B"). As a result, the recordStore with spilled records might not have "a" and "A" next to each other, but HashAggStage::getNextSpilled() assumes they are...

This bug causes SERVER-80374 and BF-30120 due to how tests in timeseries_lastpoint_top.js checks the results against a non-timeseries collection.



 Comments   
Comment by Githook User [ 08/Jan/24 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-82197 Handle collation in KeyString in case of Object/array

(cherry picked from commit 9af9ff389e464faa970d89b07c2d3da8e5992a63)
Branch: v7.2
https://github.com/mongodb/mongo/commit/bc2d813e909e2fad67175591757f3b3bc7ce99fe

Comment by Githook User [ 19/Dec/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-82197 Handle collation in KeyString in case of Object/array

(cherry picked from commit 9af9ff389e464faa970d89b07c2d3da8e5992a63)

GitOrigin-RevId: 1044821cf7cf6c1fe6892293c50fdc1de89e749b
Branch: v6.0
https://github.com/mongodb/mongo/commit/a66a4c950484c5613fcb6e6338ff414ac7191382

Comment by Githook User [ 19/Dec/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-82197 Handle collation in KeyString in case of Object/array

(cherry picked from commit 9af9ff389e464faa970d89b07c2d3da8e5992a63)

GitOrigin-RevId: 53b69e94d8c7f905ce797c81a027c7808835799a
Branch: v7.0
https://github.com/mongodb/mongo/commit/0ebd34b70f084081421a2f2d2c6e276e2c333548

Comment by Githook User [ 18/Dec/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-82197 Handle collation in KeyString in case of Object/array

GitOrigin-RevId: 9af9ff389e464faa970d89b07c2d3da8e5992a63
Branch: master
https://github.com/mongodb/mongo/commit/5e4cfbc777c228c4430cf569e77423e0994af11f

Comment by Githook User [ 22/Nov/23 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com', 'username': ''}

Message: Revert "SERVER-82197 Handle collation in KeyString in case of Object/array"

This reverts commit 96841a6a6c2ebe78b01cf8c6a919b3e34838095f.
Branch: master
https://github.com/mongodb/mongo/commit/9e99c85318ed4054e787e6b295f1006962743f25

Comment by Githook User [ 22/Nov/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-82197 Handle collation in KeyString in case of Object/array
Branch: master
https://github.com/mongodb/mongo/commit/96841a6a6c2ebe78b01cf8c6a919b3e34838095f

Comment by Foteini Alvanaki [ 13/Nov/23 ]

I looked at the code. The problem is that the collator is not used when we serialize an array or object key.

 

I think we should handle this ticket and SERVER-61629 together. This is not necessary after all.

Comment by Irina Yatsenko (Inactive) [ 11/Nov/23 ]

The problem still repro if the key is an array or object:

For example, for arrays have:

// either use debug build or set the internal param to increase spilling
db.createCollection("u", {collation: {locale: 'en_US', strength: 2}})
db.u.insertMany([{"_id":0, key:["A"]},{"_id":1, key:["A"]},{"_id":2, key:["B"]},{"_id":3, key:["B"]},{"_id":4, key:["a"]},{"_id":5, key:["a"]}])
db.u.aggregate({$group: {_id: "$key"}})
 
result in Classic
{ "_id" : [ "A" ] }
{ "_id" : [ "B" ] }
 
result in SBE
{ "_id" : [ "A" ] }
{ "_id" : [ "B" ] }
{ "_id" : [ "a" ] }

Comment by Foteini Alvanaki [ 16/Oct/23 ]

Yes, this is a duplicate of SERVER-81390. I am closing this issue.

Comment by David Storch [ 16/Oct/23 ]

At a glance this looks like a duplicate of SERVER-81390, for which foteini.alvanaki@mongodb.com already has a fix. Foteini, could you confirm and close as a dupe?

Generated at Thu Feb 08 06:48:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.