[SERVER-35477] Sort by array field doesn't work Created: 07/Jun/18  Updated: 23/Jul/18  Resolved: 08/Jun/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ayamnikov Assignee: Nick Brewer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-19402 Change semantics of sorting by array ... Closed
Operating System: ALL
Participants:

 Description   

First case
Running MongoDB 3.6.5 and preparing collection:

# mongo
MongoDB shell version v3.6.5
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.5
> use test
switched to db test
> db.test.drop()
true
> db.test.insert({k: [1, 'x']})
WriteResult({ "nInserted" : 1 })
> db.test.insert({k: [1, 'y']})
WriteResult({ "nInserted" : 1 })
> db.test.insert({k: [1, 'x']})
WriteResult({ "nInserted" : 1 })

 
Making asc sort and getting wrong result:

> db.test.aggregate([{$sort: {k: 1}}])
{ "_id" : ObjectId("5b191e295069772f72806a1c"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2b5069772f72806a1d"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b191e2e5069772f72806a1e"), "k" : [ 1, "x" ] }

 
Expected:

{ "_id" : ObjectId("5b191e295069772f72806a1c"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2e5069772f72806a1e"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2b5069772f72806a1d"), "k" : [ 1, "y" ] }

Meanwhile desc sort works well:

> db.test.aggregate([{$sort: {k: -1}}])
{ "_id" : ObjectId("5b191e2b5069772f72806a1d"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b191e295069772f72806a1c"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2e5069772f72806a1e"), "k" : [ 1, "x" ] }

Second case
In MongoDB 3.4.6 things work slightly different.

Cursor sort asc works wrong:

> db.test.find().sort({k: 1})
{ "_id" : ObjectId("5b19203af8c333c3a244ac98"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b19203df8c333c3a244ac99"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b192040f8c333c3a244ac9a"), "k" : [ 1, "x" ] }

Cursor sort desc works right:

> db.test.find().sort({k: -1})
{ "_id" : ObjectId("5b19203df8c333c3a244ac99"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b19203af8c333c3a244ac98"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b192040f8c333c3a244ac9a"), "k" : [ 1, "x" ] }

Aggregation sort asc works right:

> db.test.aggregate([{$sort: {k: 1}}])
{ "_id" : ObjectId("5b19203af8c333c3a244ac98"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b192040f8c333c3a244ac9a"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b19203df8c333c3a244ac99"), "k" : [ 1, "y" ] }

Aggregation sort desc works right:

> db.test.aggregate([{$sort: {k: -1}}])
{ "_id" : ObjectId("5b19203df8c333c3a244ac99"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b19203af8c333c3a244ac98"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b192040f8c333c3a244ac9a"), "k" : [ 1, "x" ] }

 



 Comments   
Comment by ayamnikov [ 08/Jun/18 ]

Hi Nick,

Thank you for explanation. But I have a question.

Don't you think this is a little bit weird semantics for comparison operator? Does it have any practical sense? I've never met this semantics before. I have some production code which relies on the comparison where i-th element of one array compares to i-th element of other, like in strings comparison. And it just worked until I updated MongoDB. I'm using sorting to order documents by 2-tuple key of the format

 [{unix timestamp rounded to minutes}, {guid}] 

and I want to use here element-by-element comparison. I believe this is the default array comparison semantics in most languages and libraries.

JavaScript (node.js):

> [[1,'x'], [1, 'y'], [1,'x']].sort()
[ [ 1, 'x' ],
 [ 1, 'x' ],
 [ 1, 'y' ] ]

Python:

In [2]: sorted( [[1,'x'], [1, 'y'], [1,'x']] )
Out[2]: [[1, 'x'], [1, 'x'], [1, 'y']]

Even PostgreSQL uses element-by-element array comparison: https://www.postgresql.org/docs/9.1/static/functions-array.html

Regards,

Andrey

Comment by Nick Brewer [ 07/Jun/18 ]

Hi Andrey, 

The behavior you're seeing in both cases is expected. The disparity you're seeing between 3.4.6 and 3.6.5 is the result of a change made in 3.5.11, where the sort semantics were updated to address the issue of find and aggregate returning different results. The full details of that change have been added as an issue link. 

When you perform an ascending sort in 3.5.11 and above, the lowest-valued element of the array is used to order the results that are returned. In the case of your first query:

> db.test.aggregate([{$sort: {k: 1}}])
{ "_id" : ObjectId("5b191e295069772f72806a1c"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2b5069772f72806a1d"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b191e2e5069772f72806a1e"), "k" : [ 1, "x" ] }

The lowest-valued element is 1. This is due the fact that numbers are a lower BSON type within the comparison order that is used to sort results:

https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-pipeline-metadata

Conversely, when you perform a descending sort, the highest-valued element of the array is used. In the case of your second query:

db.test.aggregate([{$sort: {k: -1}}])
{ "_id" : ObjectId("5b191e2b5069772f72806a1d"), "k" : [ 1, "y" ] }
{ "_id" : ObjectId("5b191e295069772f72806a1c"), "k" : [ 1, "x" ] }
{ "_id" : ObjectId("5b191e2e5069772f72806a1e"), "k" : [ 1, "x" ] }

The highest-value element is "y", as strings are higher within the comparison order. 

Regards,

Nick

 

 

Generated at Thu Feb 08 04:39:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.