[SERVER-51120] Find queries with SORT_MERGE incorrectly sort the results when the collation is specified Created: 24/Sep/20  Updated: 29/Oct/23  Resolved: 02/Oct/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.2, 4.2.11, 4.0.21, 3.6.21

Type: Bug Priority: Critical - P2
Reporter: Mindaugas Malinauskas Assignee: Mindaugas Malinauskas
Resolution: Fixed Votes: 0
Labels: KP42, KP44
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0, v3.6
Steps To Reproduce:

> db.c.createIndex({a : 1, b : 1}, {collation: {locale:"en", strength:2}})
 
> db.c.insertMany([{a: "1", b: "a"}, {a: "1", b: "c"}, {a: "2", b: "b"}, {a: "2", b: "d"}])
 
> db.c.find({a: {$in: ["1", "2"]}}).collation({locale:"en", strength:2}).sort({b:1})
{ "_id" : ObjectId("5f6c67a5697671c039a17929"), "a" : "1", "b" : "a" }
{ "_id" : ObjectId("5f6c67a5697671c039a1792a"), "a" : "1", "b" : "c" }
{ "_id" : ObjectId("5f6c67a5697671c039a1792b"), "a" : "2", "b" : "b" }
{ "_id" : ObjectId("5f6c67a5697671c039a1792c"), "a" : "2", "b" : "d" }

Sprint: Query 2020-10-05, Query 2020-10-19
Participants:
Case:

 Description   
Issue Status as of Sep 30, 2020

ISSUE DESCRIPTION AND IMPACT

Certain queries involving a collated index and sorting on the collated fields can return incorrectly sorted results. This impacts $or and $in operations where each clause is indexed in a way that does not require a blocking sort. The set of results is accurate, but the documents are not ordered correctly.

The bug is that the SORT_MERGE query plan stage incorrectly interprets the collated index keys.

DIAGNOSIS AND AFFECTED VERSIONS

All versions are affected. You can confirm a collation query is affected by using explain(). If a SORT_MERGE stage is in the query plan, and at least one child of the SORT_MERGE stage is an IXSCAN without a FETCH, then the query can return incorrectly sorted results.

REMEDIATION AND WORKAROUNDS

This issue will be corrected in the 4.4.2, 4.2.11, 4.0.21, and 3.6.21 versions of MongoDB. Until these versions are released, the main workaround is to sort results on the client side until a fix version is released.

original description

Given a compound index with a non-simple collation, a multi-point query on a prefix fields together with a sort on a prefix of suffix fields produces incorrect sort results when a collation is specified that matches the collation of the index and a sort merge (SORT_MERGE) is selected in the query execution plan. For example, if a collection has a compound index with a non-simple collation on fields a, b, c, d, then a find command with a matching collation specified that has a multi-point query on a and b fields and is sorted on field c will not produce correctly sorted result.



 Comments   
Comment by Mindaugas Malinauskas [ 30/Oct/20 ]

Replaced "merge sort" with "sort merge".

Comment by Mindaugas Malinauskas [ 05/Oct/20 ]

Corrected fix version.

Comment by Githook User [ 05/Oct/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51120 Find queries with MERGE_SORT incorrectly sort the results when the collation is specified

(cherry picked from commit eee7fb8f2c6da144e9d4c3df7887a5ec167f3a6f)
Branch: v3.6
https://github.com/mongodb/mongo/commit/bc69104ff18946278628bfb0411e68a5132c512a

Comment by Githook User [ 04/Oct/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51120 Find queries with MERGE_SORT incorrectly sort the results when the collation is specified

(cherry picked from commit eee7fb8f2c6da144e9d4c3df7887a5ec167f3a6f)
Branch: v4.0
https://github.com/mongodb/mongo/commit/cc163fc8eba89c87a9cabc7d3dbb2ab41887ccfc

Comment by Githook User [ 04/Oct/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51120 Find queries with MERGE_SORT incorrectly sort the results when the collation is specified

(cherry picked from commit eee7fb8f2c6da144e9d4c3df7887a5ec167f3a6f)
Branch: v4.2
https://github.com/mongodb/mongo/commit/30c8bff0bf35dce45b8d18d35bad0095cd0959e1

Comment by Githook User [ 04/Oct/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51120 Find queries with MERGE_SORT incorrectly sort the results when the collation is specified

(cherry picked from commit eee7fb8f2c6da144e9d4c3df7887a5ec167f3a6f)
Branch: v4.4
https://github.com/mongodb/mongo/commit/fe46f2f99b66675fe40228a8964b0b3c1e223c30

Comment by Githook User [ 02/Oct/20 ]

Author:

{'name': 'Mindaugas Malinauskas', 'email': 'mindaugas.malinauskas@mongodb.com'}

Message: SERVER-51120 Find queries with MERGE_SORT incorrectly sort the results when the collation is specified
Branch: master
https://github.com/mongodb/mongo/commit/eee7fb8f2c6da144e9d4c3df7887a5ec167f3a6f

Comment by Eric Sedor [ 30/Sep/20 ]

The following enumerates the exact conditions under which find() evinces the issue:

  • There is an index I on the collection with the same collation specified as for the query.
  • The find query is a multi-point query on a field set P.
  • The find query is sorted on field sequence S.
  • The index I is on field sequence P, S, E, where E - can be an empty sequence.
  • In the execution plan at least one child of the Merge Sort is an IXSCAN. If all children of the Merge Sort are FETCH->IXSCAN, the query should not be affected.
  • The Merge Sort stage is selected for the query. Note, that merge sort is not always selected even if conditions 1-5 hold. The merge sort stage is produced by enumerating (also called "exploding" ) all possible combinations of value fields in set P and converting to the corresponding point queries on those combinations of values. If the number of combinations is greater than 200 then Merge Sort is not selected.
    If the find query does not include all the fields in set P, then the Merge Sort stage is not selected. Thus, missing at least one field in the query makes the query planner to fallback to adding a Sort stage, and the sort is performed correctly.

The root cause of the issue is that for the above conditions, the IXSCAN stages produce field values in the index which have been converted to collation keys and sorts are performed correctly. In the MERGE_SORT stage the collation keys are converted twice, resulting in an incorrect sort.

Generated at Thu Feb 08 05:24:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.