[SERVER-31601] Ranking(using project) + sort + skip&limit(pagination) not working as expected Created: 15/Oct/17  Updated: 27/Oct/23  Resolved: 18/Oct/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Tanmay Awasekar Assignee: Mark Agarunov
Resolution: Works as Designed Votes: 0
Labels: Bug, MAREF
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

When I perform a sort using a project fields value like this:

[{
  '$project': {
    'metaData': {
      '$let': {
        'vars': {
          'array': []
        },
        'in': {
          'sizeArray': {
            '$size': '$$array'
          }
        }
      }
    }
  }
}, {
  "$sort": {
    "metaData.sizeArray": 1
  }
}].

Here I perform a sort using the size of the array. In this example the size of the array is zero, but it can vary if the array is made out of value of a field in the collection.
Everything works fine until now. But if I perform $skip then $limit after this many documents are repeated when the value of $skip changes. For example:

db.getCollection('venues').aggregate([{
  '$project': {
    'metaData': {
      '$let': {
        'vars': {
          'array': []
        },
        'in': {
          'sizeArray': {
            '$size': '$$array'
          }
        }
      }
    }
  }
}, {
  "$sort": {
    "metaData.sizeArray": 1
  }
}, {
  "$skip": 4
}, {
  "$limit": 4
}]);

When the value of skip goes from 4..8..12, many documents are repeated.



 Comments   
Comment by Mark Agarunov [ 18/Oct/17 ]

Hello tanmayawasekar,

Thank you for the information. Looking over this, I believe this may have to do with how sort works. When sorting by a field, the only guarantee is that the results will be in order in relation to that field. If you have multiple documents with the same value in the field you are sorting by, there is no guarantee that these documents will be sorted in the same order every time. That is, if you have the following documents, sorted by x:1:

{x:1,y:4}
{x:1,y:1}
{x:1}
{x:1,y:3}
{x:2,y:0}
{x:2,y:2}
{x:2,y:1}

The only guarantee $sort is making is that the documents with x:1 will come before x:2. If this query was run again, it could produce the following, also valid and properly sorted, result set:

{x:1}
{x:1,y:3}
{x:1,y:4}
{x:1,y:1}
{x:2,y:1}
{x:2,y:0}
{x:2,y:2}

If you are using limit and skip, the order of the documents can change between queries. As an example, if the two lists of documents above are the results of running the same query twice, and if you were to use skip: 1 and limit: 1 on the first query, and then skip:3 and limit:1 on the second, they would both return the same document, x:1,y:1, causing the duplicate results you are seeing. To ensure that the sort order is consistent, you could add a unique field to sort on, such as _id as mentioned in DOCS-9994 which contains some more information regarding this behavior of sort.

Thanks,
Mark

Comment by Tanmay Awasekar [ 18/Oct/17 ]

MongoDB version - v3.4.9,
Current Setup is - Standalone,
The explain output:

{
    "stages" : [ 
        {
            "$cursor" : {
                "query" : {},
                "fields" : {
                    "_id" : 1
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "kiddoo.venues",
                    "indexFilterSet" : false,
                    "parsedQuery" : {},
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "direction" : "forward"
                    },
                    "rejectedPlans" : []
                }
            }
        }, 
        {
            "$project" : {
                "_id" : true,
                "metaData" : {
                    "$let" : {
                        "vars" : {
                            "array" : {
                                "$const" : []
                            }
                        },
                        "in" : {
                            "sizeArray" : {
                                "$size" : [ 
                                    "$$array"
                                ]
                            }
                        }
                    }
                }
            }
        }, 
        {
            "$sort" : {
                "sortKey" : {
                    "metaData.sizeArray" : 1
                },
                "limit" : NumberLong(12)
            }
        }, 
        {
            "$skip" : NumberLong(8)
        }
    ],
    "ok" : 1.0
},

For the dataset, you can replicate the issue using any collection which has more than 20 documents. Just copy the query I have given above.

Comment by Mark Agarunov [ 17/Oct/17 ]

Hello tanmayawasekar,

Thank you for the report. To get a better idea of what may be causing this behavior, I'd like to get some additional information.

  • Which version of MongoDB is being used?
  • What is the current setup of MongoDB? That is, is this a standalone instance, sharded, and/or in a replicaset?
  • Could you please append .explain(true) to the end of your query and provide the output?
  • If possible, could you provide the dataset you're using for this query so that we can reproduce the issue?

I've created a secure upload portal so that you can send us these files privately.
This should give some more insight into the issue.

Thanks,
Mark

Comment by Tanmay Awasekar [ 15/Oct/17 ]

Here is the entire example again.

db.getCollection('venues').aggregate([{
  '$project': {
    'metaData': {
      '$let': {
        'vars': {
          'array': []
        },
        'in': {
          'sizeArray': {
            '$size': '$$array'
          }
        }
      }
    }
  }
}, {
  "$sort": {
    "metaData.sizeArray": 1
  }
}, {
  "$skip": 8
}, {
  "$limit": 4
}])

Generated at Thu Feb 08 04:27:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.