[SERVER-28195] $skip followed by $limit in aggregation resort & lost records when $sort by equal values Created: 05/Mar/17  Updated: 27/Oct/23  Resolved: 13/Mar/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexandr Assignee: Kelsey Schubert
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

targetMinOS: Windows 7/Windows Server 2008 R2
db version v3.4.1
git version: 5e103c4f5583e2566a45d740225dc250baacfbd7


Attachments: File test-28195.js    
Issue Links:
Documented
is documented by DOCS-9994 Unique field must be included to ensu... Closed
Duplicate
is duplicated by SERVER-30338 $sort + $limit optimizer failing in s... Closed
is duplicated by SERVER-33107 Incorrect result while using pipeline... Closed
is duplicated by SERVER-43105 Sort operation retun duplicate element Closed
Related
is related to SERVER-27716 mongos 3.2.10, sharded cluster, skip ... Closed
is related to SERVER-51498 Sorting on duplicate values causes re... Closed
Operating System: ALL
Steps To Reproduce:

> db.issue.find()

{ "_id" : ObjectId("58b3d25422ad9b04484145f9"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0227T07:16:36.173Z"), "likes" : [ ObjectId("589d65ac10ee7a29a08e022a") ], "visible" : 1,"country" : ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "инструкция", "caption" : "Заголовок" }, "requireModeration" : 1, "rating" : 0 }
{ "_id" : ObjectId("58b3d25d22ad9b04484145fa"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0227T07:16:45.772Z"), "likes" : [ ], "visible" : 1, "country" :ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "инструкция22", "caption": "Заголовок 22" }, "requireModeration" : 1, "rating" : 0 }
{ "_id" : ObjectId("58b3d2d122ad9b04484145fb"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0227T07:18:41.065Z"), "likes" : [ ObjectId("589d65ac10ee7a29a08e022a") ], "visible" : 1,"country" : ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "что-то", "caption" : "Ииииха" }, "requireModeration" : 1, "rating" : 0 }
{ "_id" : ObjectId("58b9656e905c7a279806f8f6"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0303T12:45:34.161Z"), "likes" : [ ], "visible" : 1, "requireModeration" : 1, "country" :ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "йцуйцу", "caption" :"йцуйцу" }, "rating" : 0 }
{ "_id" : ObjectId("58b96575905c7a279806f8f7"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0303T12:45:41.038Z"), "likes" : [ ], "visible" : 1, "requireModeration" : 1, "country" :ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "йцуйцу11", "caption" :"йцуйцуй11" }, "rating" : 0 }
{ "_id" : ObjectId("58b96578905c7a279806f8f8"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0303T12:45:44.677Z"), "likes" : [ ], "visible" : 1, "requireModeration" : 1, "country" :ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "йцуйцу1122", "caption" :"йцуйцуй1122" }, "rating" : 0 }
{ "_id" : ObjectId("58b96e44905c7a279806f8f9"), "author" :ObjectId("589d65ac10ee7a29a08e022a"), "type" : "customs", "date" : ISODate("2017-0303T13:23:16.176Z"), "likes" : [ ], "visible" : 1, "requireModeration" : 1, "country" :ObjectId("58a1a0bdc98ea02284fe077b"), "guideBody" : { "text" : "text", "caption" : "Title" }, "rating" : 10 }

Participants:

 Description   

Seems, issue exists only when you $sort records having equal value (rating here)

>db.issue.aggregate( [ 
{ '$match':       { country: ObjectId("58a1a0bdc98ea02284fe077b"),        visible: 1,        type: 'customs' } },  
{ '$sort': { rating: -1 } },   
{ '$project': {_id:1}}
]);

{ "_id" : ObjectId("58b96e44905c7a279806f8f9") }
{ "_id" : ObjectId("58b3d25422ad9b04484145f9") }
{ "_id" : ObjectId("58b3d25d22ad9b04484145fa") }
{ "_id" : ObjectId("58b3d2d122ad9b04484145fb") }
{ "_id" : ObjectId("58b9656e905c7a279806f8f6") }
{ "_id" : ObjectId("58b96575905c7a279806f8f7") }
{ "_id" : ObjectId("58b96578905c7a279806f8f8") }

> db.issue.aggregate( [ 
{ '$match':       { country: ObjectId("58a1a0bdc98ea02284fe077b"),        visible: 1,        type: 'customs' } },   
{ '$sort': { rating: -1 } },   
{ '$project': {_id:1}},   
*{ '$skip': 0 },   
{ '$limit': 5 } *
] );

{ "_id" : ObjectId("58b96e44905c7a279806f8f9") }
{ "_id" : ObjectId("58b9656e905c7a279806f8f6") }
{ "_id" : ObjectId("58b96575905c7a279806f8f7") }
{ "_id" : ObjectId("58b3d25d22ad9b04484145fa") }
{ "_id" : ObjectId("58b3d25422ad9b04484145f9") }

here we:

  • have resorted results
  • lost * { "_id" : ObjectId("58b3d2d122ad9b04484145fb") }

    *

there is no this record on next 5

>db.issue.aggregate( [
{ '$match':       { country: ObjectId("58a1a0bdc98ea02284fe077b"),        visible: 1,        type: 'customs' } },   
{ '$sort': { rating: -1 } },   
{ '$project': {_id:1}},  
{ '$skip': 5 },   
{ '$limit': 5 } 
] );

{ "_id" : ObjectId("58b96575905c7a279806f8f7") }
{ "_id" : ObjectId("58b96578905c7a279806f8f8") }



 Comments   
Comment by Ramon Fernandez Marina [ 14/Mar/17 ]

trueshura, to add to Thomas' answer, here's a small test that I think shows how $sort behaves:

db.c.drop()
 
var num = 10
 
for (i=0; i<num; i++) {
    db.c.insert({_id : i, data : 0})
}
 
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 1}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 2}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 3}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 4}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 5}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 6}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 7}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 8}, {$group : {_id: "$data", order : { $push : "$_id" }}}])
db.c.aggregate([{'$sort': {data : 1}}, {$skip : 0}, {$limit : 9}, {$group : {_id: "$data", order : { $push : "$_id" }}}])

The same behavior exhibits in 3.4, 3.2, 3.0, 2.6 and 2.4 (the integers in the order array are the _id of the documents):

{ "_id" : 0, "order" : [ 0 ] }
{ "_id" : 0, "order" : [ 0, 1 ] }
{ "_id" : 0, "order" : [ 0, 1, 2 ] }
{ "_id" : 0, "order" : [ 1, 3, 0, 2 ] }
{ "_id" : 0, "order" : [ 3, 4, 1, 0, 2 ] }
{ "_id" : 0, "order" : [ 3, 4, 1, 0, 2, 5 ] }
{ "_id" : 0, "order" : [ 3, 4, 1, 0, 5, 2, 6 ] }
{ "_id" : 0, "order" : [ 7, 4, 1, 3, 5, 0, 2, 6 ] }
{ "_id" : 0, "order" : [ 8, 4, 1, 7, 5, 3, 0, 2, 6 ] }

As Thomas points out, an additional sort is needed to get the behavior you're looking for.

Regards,
Ramón.

Comment by Kelsey Schubert [ 10/Mar/17 ]

Hi trueshura,

There are number of reasons that the ordering of results with the same sort key might not be stable:

  1. The query execution stages for performing blocking sort operations in find/aggregation do not use a stable sort algorithm.
  2. Unpredictability of network latency and getMore ordering for queries on sharded clusters may result in different orders. This is similar to the behavior described in SERVER-27716, which was closed as Works as Designed.
  3. Changes to the underlying storage. For example, document moves on the MMAPv1 storage engine can result in repeated COLLSCANs returning the data in different orders. Therefore, even if the blocking sort query execution algorithm was stable, this still wouldn't be sufficient to guarantee consistent ordering of equal-keyed results.

To ensure that the sort order is consistent across queries, we recommend including a unique field, such as _id in your sort order. This suggestion is described by Mathias on SERVER-11407.

I've opened DOCS-9994 to improve our documentation around this behavior. Please feel free to vote for it and watch it for updates.

Thank you,
Thomas

Comment by Kelsey Schubert [ 05/Mar/17 ]

Hi trueshura,

Thank you for reporting this behavior, we're able to reproduce and are investigating. Please continue to watch this ticket for updates.

Kind regards,
Thomas

Generated at Thu Feb 08 04:17:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.