[SERVER-24122] Make mongos sorted merge respect the collation Created: 10/May/16  Updated: 06/Jun/16  Resolved: 31/May/16

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: 3.3.8

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Query 15 (06/03/16)
Participants:

 Description   

There are two possible strategies:

  1. Make comparisons in the AsyncResultsMerger use a collator. This will require linking ICU into mongos.
  2. Make the SORT_KEY_GENERATOR stage convert strings inside comparison keys to their corresponding CollatorInterface::ComparisonKey representations. This depends on updating key generation for nested objects and arrays as described in SERVER-23172.


 Comments   
Comment by Githook User [ 31/May/16 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-24122 make SORT_KEY_GENERATOR convert strings in sort keys to ICU comparison keys

This means that mongos merge sorting, which is done via $sortKey meta
projection, can be done correctly without a CollatorInterface.
Branch: master
https://github.com/mongodb/mongo/commit/5df895a08fd368d124ba69239e6d311216ee4289

Comment by Tess Avitabile (Inactive) [ 20/May/16 ]

Okay, I'm fine with that.

Comment by David Storch [ 20/May/16 ]

I'm fine with the sort order being wrong for nested docs when there is a non-simple collator until we implement SERVER-23172. It will just be a case that we haven't implemented yet. It's no worse than aggregation not respecting the collation because we haven't hooked it up yet---i.e. it's wrong because there's an unimplemented subtask left to do.

Comment by Tess Avitabile (Inactive) [ 20/May/16 ]

Yes, SERVER-23172 is required. But I'm not sure we should change the sort key generation stage to convert strings into their comparison keys before we implement it, because then sorts with collation will be incorrect for documents containing strings in nested subdocuments and arrays. Currently the SORT, SORT_MERGE, and ENSURE_SORTED stages call woCompare() with a collation on the sort keys, which just contain the raw strings. Now the sort keys will contain comparison keys at the top-level and raw strings in nested subdocuments and arrays, so it's not clear how the SORT, SORT_MERGE, and ENSURE_SORTED stages should perform comparisons.

Comment by David Storch [ 20/May/16 ]

Yup, but we can do it before we implement SERVER-23172. It just locks us in to implementing SERVER-23172. But I think we decided SERVER-23172 was strictly required anyway for efficiency of oplog application, right?

Comment by Tess Avitabile (Inactive) [ 20/May/16 ]

#2 also depends on SERVER-23172, correct?

Comment by David Storch [ 20/May/16 ]

After a bit of investigation, I believe that the correct implementation strategy is #2 (changing SORT_KEY_GENERATOR to convert strings to their collation keys). Halting progress for the time being, since a clean implementation of #2 depends on some of tess.avitabile's in-flight changes.

Generated at Thu Feb 08 04:05:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.