[SERVER-81390] HashAggStage fails to respect the collation when spilling to disk Created: 22/Sep/23  Updated: 10/Nov/23  Resolved: 27/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.2, 6.0.11, 7.1.0
Fix Version/s: 7.1.1, 7.2.0-rc0, 6.0.12, 7.0.4

Type: Bug Priority: Major - P3
Reporter: Justin Seyster Assignee: Foteini Alvanaki
Resolution: Fixed Votes: 0
Labels: query-director-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-82197 Incorrect query results in SBE if $gr... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0, v6.0
Sprint: QE 2023-10-16, QE 2023-10-30
Participants:
Linked BF Score: 20

 Description   

The HashAgg spill algorithm converts each group key to a KeyString but does not provide the conversion operation with a function to normalize the key according to the collator. As a result, keys that would be considered equal in the in-memory hash table are considered distinct in the spilled data.

This updated jstest exercises the problem. The test fails normally but succeeds if the pipeline is forced to run in the Classic engine.

diff --git a/jstests/noPassthrough/group_spill_with_collation.js b/jstests/noPassthrough/group_spill_with_collation.js
index f612e8b59bb..42d56ca005a 100644
--- a/jstests/noPassthrough/group_spill_with_collation.js
+++ b/jstests/noPassthrough/group_spill_with_collation.js
@@ -2,6 +2,7 @@
  * Tests $group execution with increased spilling and a non-simple collation.
  */
 
+import {assertArrayEq} from "jstests/aggregation/extras/utils.js";
 import {checkSBEEnabled} from "jstests/libs/sbe_util.js";
 
 const conn = MongoRunner.runMongod();
@@ -24,12 +25,35 @@ for (let i = 0; i < 1000; i++) {
 
 assert.commandWorked(db.adminCommand(
     {setParameter: 1, internalQuerySlotBasedExecutionHashAggForceIncreasedSpilling: true}));
+
+// Test that accumulators respect the collation when the group operation spills to disk.
 const caseInsensitive = {
     collation: {locale: "en_US", strength: 2}
 };
-const results =
+let results =
     coll.aggregate([{$group: {_id: null, result: {$addToSet: "$x"}}}], caseInsensitive).toArray();
 assert.eq(1, results.length, results);
 assert.eq({_id: null, result: ["a"]}, results[0]);
 
+// Test that comparisons of the group key respect the collation when the group operation spills to
+// disk.
+for (let i = 0; i < 1000; i++) {
+    if (i % 3 === 0) {
+        assert.commandWorked(coll.insert({x: 'b'}));
+    } else {
+        assert.commandWorked(coll.insert({x: 'B'}));
+    }
+}
+
+results =
+    coll.aggregate(
+            [{$group: {_id: "$x", normalizedX: {$first: {$toLower: "$x"}}, count: {$count: {}}}}],
+            caseInsensitive)
+        .toArray();
+assertArrayEq({
+    actual: results,
+    expected: [{normalizedX: "a", count: 1000}, {normalizedX: "b", count: 1000}],
+    fieldsToSkip: ["_id"]
+});
+
 MongoRunner.stopMongod(conn);



 Comments   
Comment by Githook User [ 27/Oct/23 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-81390 Use collator to create the record key in hash_agg

SERVER-76060 Deserialize SBE ArraySet with collator

(cherry picked from commit a19074b842b752ee0a61810e0b8f6d79c5aa80c1)

SERVER-81390 Use collator to create the record key in hash_agg

(cherry picked from commit d0811e844e8566dc276fcd73fceabec71c0e2717)
Branch: v6.0
https://github.com/mongodb/mongo/commit/62d51d7f2698547635f4df12049f70ba283b74b7

Comment by Githook User [ 27/Oct/23 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-81390 Use collator to create the record key in hash_agg

SERVER-76060 Deserialize SBE ArraySet with collator

(cherry picked from commit a19074b842b752ee0a61810e0b8f6d79c5aa80c1)

SERVER-81390 Use collator to create the record key in hash_agg

(cherry picked from commit d0811e844e8566dc276fcd73fceabec71c0e2717)
Branch: v7.0
https://github.com/mongodb/mongo/commit/62c4770285c731eb38bc0148c2f2d2fbc9f43dcb

Comment by Githook User [ 27/Oct/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-81390 Use collator to create the record key in hash_agg
Branch: v7.1
https://github.com/mongodb/mongo/commit/4120fda44e175a61a70a91bd116834761c22f925

Comment by Githook User [ 19/Oct/23 ]

Author:

{'name': 'Foteini Alvanaki', 'email': 'foteini.alvanaki@mongodb.com', 'username': ''}

Message: SERVER-81390 Use collator to create the record key in hash_agg
Branch: master
https://github.com/mongodb/mongo/commit/d0811e844e8566dc276fcd73fceabec71c0e2717

Comment by Foteini Alvanaki [ 12/Oct/23 ]

I confirmed that v6.0 , v7.0 and v7.1 are all affected by this bug.

Comment by Justin Seyster [ 26/Sep/23 ]

ana.meza@mongodb.com, yes, there is a potential for incorrect output for any pipeline with a collation and a $group operation that both executes in SBE and needs to "spill" to disk. I haven't verified, but I believe the bug goes back to v5.2.

Generated at Thu Feb 08 06:46:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.