[SERVER-32297] Aggregations that merge on mongos do not respect the collation Created: 13/Dec/17  Updated: 29/Jan/18  Resolved: 28/Dec/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kyle Suarez Assignee: Kyle Suarez
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-32956 Unblacklist aggregation/sources/sort/... Closed
Duplicate
duplicates SERVER-32430 DocumentSourceSort sorts array docume... Closed
Related
related to SERVER-32282 Aggregation text search returns text ... Closed
is related to SERVER-22760 Sharded aggregation pipelines which i... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Steps To Reproduce:

First, apply this patch to run with multiple shards:

diff --git a/buildscripts/resmokeconfig/suites/aggregation_sharded_collections_passthrough.yml b/buildscripts/resmokeconfig/suites/aggregation_sharded_collections_passthrough.yml
index de8b568eee..68b988f909 100644
--- a/buildscripts/resmokeconfig/suites/aggregation_sharded_collections_passthrough.yml
+++ b/buildscripts/resmokeconfig/suites/aggregation_sharded_collections_passthrough.yml
@@ -48,6 +48,7 @@ executor:
     n: 20
   fixture:
     class: ShardedClusterFixture
+    num_shards: 2
     mongos_options:
       set_parameters:
         enableTestCommands: 1

and then run

$ python2 buildscripts/resmoke.py --suites aggregation_sharded_collections_passthrough jstests/aggregation/sources/sort/collation_sort.js

Sprint: Query 2017-12-18, Query 2018-01-01
Participants:

 Description   

In SERVER-22760, we allowed certain aggregation pipelines to merge on mongos. However, it seems that the merging logic does not respect the collation when merging sorted results from multiple shards. I can't reproduce this behavior when disabling merging on mongos via internalQueryProhibitMergingOnMongoS.



 Comments   
Comment by Githook User [ 02/Jan/18 ]

Author:

{'name': 'Kyle Suarez', 'username': 'ksuarz', 'email': 'kyle.suarez@mongodb.com'}

Message: SERVER-32297, SERVER-32430 fix $sort in-memory sort and $sortKey serialization

(cherry picked from commit 79352e71b697cb8c126510095bba7fd816128701)
Branch: v3.6
https://github.com/mongodb/mongo/commit/5bb6e3cb98e404a55d5edba802d24559eca20d6b

Comment by David Storch [ 28/Dec/17 ]

The fix for this bug was committed under SERVER-32430, so I'm going to mark this ticket resolved as "Duplicate" rather than "Fixed".

Comment by Kyle Suarez [ 20/Dec/17 ]

Alright, the problem is not in the AsyncResultsMerger but rather in DocumentSourceSort. If the sort requires a merge, DocumentSourceSort is responsible for serializing the sort key as a metadata field. Internally, it extracts sort keys one of two ways: a "fast path" for when there are no arrays along the path, and a "slow path" that uses the SortKeyGenerator. Unfortunately, only the SortKeyGenerator is collation aware – if we take the fast path, we generate sort keys with strings that have not been transformed into their ICU comparison keys.

I'm going to run microbenchmarks on a patch that always uses the slow SortKeyGenerator path if we have a non-simple collation. If there's a noticeable performance regression, we can try something else.

Generated at Thu Feb 08 04:29:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.