[SERVER-51498] Sorting on duplicate values causes repeated results with limit and skip Created: 12/Oct/20  Updated: 27/Oct/23  Resolved: 21/Jan/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jacob Hartman Assignee: Edwin Zhou
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mongo-bug-1.png    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-51725 Duplicate documents while using "Sort... Closed
is duplicated by SERVER-53181 Lost results of search using sort, sk... Closed
is duplicated by SERVER-57613 Sort + limit returns data in random o... Closed
is duplicated by SERVER-53297 Return sorted query results in random... Closed
Problem/Incident
Related
related to SERVER-28195 $skip followed by $limit in aggregati... Closed
is related to DOCS-9994 Unique field must be included to ensu... Closed
Operating System: ALL
Steps To Reproduce:

Tested on Windows 4.4.1, 4.2.10 and Linux Docker 4.4.1 and 4.2.10.

4.2.10 on both OS's did not have this issue and return result groups as expected with results not repeating themselves. Results appeared to be sorted by _id as well as the sort value requested.  This default sort by _id does not appear in 4.4.1, which seems to be causing the issue.

4.4.1 on both OS's appeared to have the issue described. Manually adding a sort by _id causes the issue to disappear.

  • Insert a series of documents (tested with about 20 document) containing one field with a duplicate value for all documents and one field with a value unique to each document.
  • Perform a query containing a sort on the field with the duplicate value and limit the results to some value (larger limits will make the issue more noticeable, tested with 5 or 10)
  • Perform the same query, but with a skip >= to the limit.
  • Expected: As long as the skip >= limit, the second results group should return unique values not in the first group
  • Current results: Values from the first group reappear in the second group. (In the image, note that 3_item, 0_item, 1_item, and 2_item appear in both groups)
Participants:
Case:

 Description   
Issue Status as of Jan 15, 2020

ISSUE DESCRIPTION AND IMPACT

As of MongoDB 4.4, find+sort operations that perform blocking/in-memory sorts on fields that contain non-unique values are much more likely to result in unstable sort orders between different operations. This most directly impacts pagination style operations and causes results to be duplicated or omitted from one query to the next.

This is because in MongoDB 4.4, find+sort began using the same sorting logic as the aggregation $sort stage, which has always been known to be unstable. Prior to MongoDB 4.4, find often provided, but did not guarantee, stable results.

DIAGNOSIS AND AFFECTED VERSIONS

All versions of MongoDB are affected by this issue, with impact being more likely in MongoDB 4.4.

If a blocking sort operation sorts on a non-unique field, unstable sort results can be observed using .limit() and .skip():

db.test_col.find().sort({ counter: -1 }).skip(0).limit(5)
{  "_id" : ObjectId("5f84bcaafe52900c1be1060e"),  "counter" : 2,  "name" : "2_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be10610"),  "counter" : 2,  "name" : "4_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060c"),  "counter" : 2,  "name" : "0_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060d"),  "counter" : 2,  "name" : "1_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060f"),  "counter" : 2,  "name" : "3_item" }
 
db.test_col.find().sort({ counter: -1 }).skip(5).limit(5)
{  "_id" : ObjectId("5f84bcaafe52900c1be10610"),  "counter" : 2,  "name" : "4_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060c"),  "counter" : 2,  "name" : "0_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060d"),  "counter" : 2,  "name" : "1_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be1060f"),  "counter" : 2,  "name" : "3_item" }
{  "_id" : ObjectId("5f84bcaafe52900c1be10613"),  "counter" : 2,  "name" : "7_item" }

In this example, 0_item, 1_item, 3_item, and 4_item occur in both result sets; the skip and limit values are not enough to create distinct result sets.

REMEDIATION AND WORKAROUNDS

Review the 4.4 release notes for information about these changes. For more detail, see this sort stability documentation. In short, one quick fix is to add _id to sorts that require stability between queries.

After evaluating options, we've unfortunately found that at this time, we cannot completely guarantee sort stability on non-unique values without an unacceptable performance impact. In the absence of a complete guarantee, we aren't able to justify the level of effort that would provide a partial guarantee.

Original Description

In 4.4.1, if a duplicate, non-unique value is sorted on and the results are limited and skipped to create groups of results, values in one group of results appear in other groups of results. So performing sorts, limits, and skips for something like pagination will cause results to be repeated in other pages.

 



 Comments   
Comment by Edwin Zhou [ 04/Dec/20 ]

Hi chad@onspring.com,

I recognize the frustration that the unexpected changes made to query sort in 4.4 causes. We'll be considering your input when we clarify the upgrade docs for MongoDB 4.4 to state that upgrading will affect sort stability when sorting by non-unique fields. This work is tracked in DOCS-9994 and has been reprioritized.

This should not have been closed or marked as duplicate. It appears unrelated to aggregation.

To clarify the "Duplicate" resolution: up until MongoDB 4.4, query sort and aggregation sort behaved differently due to differing codepaths. In 4.4, we unified the query sort codepath with the aggregation sort codepath, so you can expect query sort to behave as aggregation sort. Given that context, the issue detailed in SERVER-28195, which occurs in aggregation sort, becomes relevant to the query sort issue seen in this ticket. After review, we agree with you that it this issue doesn't directly duplicate SERVER-28195.

Best,
Edwin

Comment by Chad Kreimendahl [ 04/Dec/20 ]

We've never sorted by _id in our code when sorting on something non-unique, and yet with unchanging data, the results returned the same at least over short (< 5 minute) windows, each time.

So whatever changed, whether you think it did or not, is having a direct impact on every single location where we were previously sorting.  It should be listed as a rather large breaking change in 4.4, as the interaction is now wildly different than prior to 4.4

Comment by Asya Kamsky [ 04/Dec/20 ]

Note that if you do not specify $sort then the order of documents is NOT guaranteed (and has never been guaranteed) - in this scenario when you sort by one field "a" and several documents have the same value of "a", it's never been guaranteed that running that twice would give the same order (within those several documents with same value of a).    If you want to specify an order, you have to add another field (like sorting by "a", "_id" for instance).

 

Comment by Chad Kreimendahl [ 30/Nov/20 ]

This should not have been closed or marked as duplicate. It appears unrelated to aggregation.  We're having to potentially make a massive set of updates to our code (literally every single place we potentially do any kind of sorting), as our results when using 4.4 are entirely different an unexpected when compared to 4.2, 4.0 and 3.6 (other versions we're currently using).

 

Everything works 100% as expected and returns the list exactly as you would want (with the skip) every single time in 3.6 through 4.2.  However, results from 4.4 servers feel like they come back randomly. Aka: same query on unchanged data yields different results when using skip/limit.

 

This should be considered a rather major [aka massive] breaking change for anyone who uses skip or limit, which I'm going to guess is near the totality of the user base.

Comment by Jacob Hartman [ 14/Oct/20 ]

I didn't notice any reference to this in the documentation regarding sort, limit, or skip. They should be updated to reflect this new strategy and made very clear and obvious as it's very confusing otherwise. I spent hours trying to figure out if my query was wrong and then finally stumbled on this issue.

Regardless, I feel this strategy defies developer expectation, as most people would expect a database to not repeat results in different groups and would not expect to be forced to sort on two fields when you only expect to sort on one. Although the new strategy removed redundant code (which is always great), the new default behavior works in a way that is not very obvious to developers. I'd ask for a reconsideration regarding this default behavior, as it will mostly likely be application breaking for people upgrading to newer MongoDB versions with a less-than-obvious solution.

 

Thanks,
Jacob

Comment by Edwin Zhou [ 14/Oct/20 ]

Hi hartmanj@ainfosec.com,

Thanks for reporting this behavior! I was able to reproduce your issue and based on what you've described, I believe this behavior is expected in 4.4.1 and is a duplicate of SERVER-28195. As you stated, add an additional sort on a unique value, e.g. _id, to avoid this issue.

The difference in behavior you observed is a result of MongoDB 4.2 using a different codepath for aggregation sort and query sort. In MongoDB 4.4, we eliminated this technical debt and query sort now uses the same code and has the same behavior as aggregation sort.

Edwin

Comment by Bruce Lucas (Inactive) [ 12/Oct/20 ]

Some more information:

Just hinting the _id index does not make the behavior better:

> db.c.find().sort({x:1}).skip(0).limit(5).hint({_id:1})
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be9"), "x" : "x", "i" : 2 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34beb"), "x" : "x", "i" : 4 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be7"), "x" : "x", "i" : 0 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be8"), "x" : "x", "i" : 1 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bea"), "x" : "x", "i" : 3 }
> db.c.find().sort({x:1}).skip(5).limit(5).hint({_id:1})
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34beb"), "x" : "x", "i" : 4 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be7"), "x" : "x", "i" : 0 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be8"), "x" : "x", "i" : 1 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bea"), "x" : "x", "i" : 3 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bee"), "x" : "x", "i" : 7 }

On the other hand explicitly sorting on $natural does make the behavior as expected:

> db.c.find().sort({x:1}).limit(5).sort({$natural: 1})
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be7"), "x" : "x", "i" : 0 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be8"), "x" : "x", "i" : 1 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34be9"), "x" : "x", "i" : 2 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bea"), "x" : "x", "i" : 3 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34beb"), "x" : "x", "i" : 4 }
> db.c.find().sort({x:1}).skip(5).limit(5).sort({$natural: 1})
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bec"), "x" : "x", "i" : 5 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bed"), "x" : "x", "i" : 6 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bee"), "x" : "x", "i" : 7 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bef"), "x" : "x", "i" : 8 }
{ "_id" : ObjectId("5f84ae3bcf5e800fa7c34bf0"), "x" : "x", "i" : 9 }

Generated at Thu Feb 08 05:25:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.