[SERVER-63132] Allow indexed field to provide sort key when index has non-simple collation Created: 31/Jan/22  Updated: 29/Oct/23  Resolved: 26/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: James Wahlin Assignee: Peter Volk
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File newplan.txt     Text File oldplan.txt     Text File plan_different_collations.txt    
Issue Links:
Related
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Sprint: QO 2023-05-15, QO 2023-05-29, QO 2023-06-12, QO 2023-06-26, QO 2023-07-10
Participants:
Case:

 Description   

In the following example, a query is run against a collection with a non-simple collation. The index created and find command both inherit the collection collation.

While 'z' is indexed, as there is no filter or sort on 'y', it cannot be used to provide indexed sort order. The indexed value for 'z' should however be usable as a sort key in a blocking sort. This would allow the find below to examine 10 index keys, sort on 'z', and then fetch 5 documents for first 5 sorted keys. Instead we construct a query plan where we fetch the document for each index key examined, fetching 10 documents instead of 5.

db.test.drop();
db.createCollection("test", {collation: {locale: "en", strength: 2}});
db.test.createIndex({x: 1, y: 1, z: 1});
for (var i = 0; i < 10; ++i) {for (var i = 0; i < 10; ++i) {db.test.insert({x: 1, y: 1, z: i});}}
var explain = db.test.explain("executionStats").find({x: 1}).sort({z: 1}).limit(5).next();
// Ideally we would fetch only 5 documents. This assert fails as we instead fetch 10.
assert.eq(5, explain.executionStats.totalDocsExamined, explain);

The starting point for this behaviorĀ is here in our code. We call a method on the index scan node that asks whether a given field isĀ fully provided for. 'Fully provided' for an index with a collation means the index bounds generated by the index scan node cannot contain string values and we can therefore materialize the original value from the index key. For this query, we are scanning the sort field 'z' from MinKey to MaxKey, so it can contain strings. Given that, we add a fetch stage below the sort. I believe this logic is too restrictive for what we need in the sort stage. The collated value of 'z' should be sufficient for sorting purposes.



 Comments   
Comment by Githook User [ 26/Jun/23 ]

Author:

{'name': 'Peter Volk', 'email': 'peter.volk@mongodb.com', 'username': 'HCSPete'}

Message: SERVER-63132 Allow indexed field to provide sort key when index has non-simple collation
Branch: master
https://github.com/mongodb/mongo/commit/f20d677be21e115dd9b914eb9e9b80878d00ce91

Comment by David Storch [ 04/Feb/22 ]

An implementation note for future reference: I believe as part of the work to implement this ticket we would need to modify the sort key generation logic to be able to deal with the case where the incoming index keys already contain ICU collation keys. At the moment, the logic for building a sort key from an index key always assumes that it must collation-encode any incoming strings:

https://github.com/mongodb/mongo/blob/78764d1593483cb96c308de98b3d5bad8a4b22ee/src/mongo/db/index/sort_key_generator.cpp#L97-L102

We would have to change this logic to be aware of whether to collation-encode the incoming index keys, or to avoid doing so because they are already collation-encoded.

Generated at Thu Feb 08 05:56:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.