[SERVER-50080] sort not working on projection created fields in find Created: 03/Aug/20  Updated: 27/Oct/23  Resolved: 18/Aug/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0-rc14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ivan Cherviakov Assignee: Backlog - Query Team (Inactive)
Resolution: Works as Designed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DOCS-13808 [Server] Make documentation about ord... Backlog
Assigned Teams:
Query
Operating System: ALL
Steps To Reproduce:

db.collection.find({ userId: 'userId', startDate: { $gt: ISODate("2020-07-01") } }, { time: { $divide: [{ $subtract: ['$endDate', '$startDate']  }, 3600000] } }).sort({ time: -1 })

Participants:

 Description   

Given query in mongodb shell:

db.collection.find({ userId: 'userId', startDate: { $gt: ISODate("2020-07-01") } }, { time: { $divide: [{ $subtract: ['$endDate', '$startDate']  }, 3600000] } }).sort({ time: -1 })

sort does not affect result, always get the same no matter use

sort({ time: 1})

or

sort({ time -1 })



 Comments   
Comment by David Storch [ 13/Aug/20 ]

I'm returning this ticket to our triage queue. The behavior here is admittedly non-obvious, but the semantics of find have long been that the sort logically occurs before the projection. I suggest that we close this ticket as "Works as Designed" and pursue the suggested documentation improvements in DOCS-13808.

Comment by Ian Boros [ 04/Aug/20 ]

Filed DOCS-13808 summarizing the necessary docs changes (I included links to the sections Asya and Dan linked to along with some others). I'm returning this to the backlog now.

Comment by Asya Kamsky [ 04/Aug/20 ]

There's a note about this in the docs

Comment by Ivan Cherviakov [ 03/Aug/20 ]

Thanks for explanation, indeed pretty confusing behaviour.

Comment by Ian Boros [ 03/Aug/20 ]

The problem here is that the sort is performed before the projection. When the sort is done, each document has the same value for the field 'time'. Since our sort implementation happens to be stable, it is effectively a noop and you get this unintuitive behavior. You can verify this by running an explain() of the query.

This does raise an interesting issue though. Historically, find() queries always provide results as if the operators were executed in the following order:
match -> sort -> skip -> limit -> project

There are cases where the actual execution of the query is done in a different order than this. For example, sometimes the 'limit' and 'sort' are combined into a top-k sort. We've also implemented an optimization where project stages are moved earlier than sort stages when the planner can determine that the project stage will make the documents smaller (see SERVER-15200). None of these optimizations alter the semantics of find(), though. This has probably never come up before because expressions in find projection areĀ  new to 4.4.

After discussing with charlie.swanson we've agreed to put this into needs triage for the query optimization team. While I don't think we should change the semantics of find(), it would be good if we can come up with a way to make this more user-friendly.

Note that if we did want to change the semantics of find(), we'd likely break existing queries which sort on a field which is modified via $slice, $elemMatch or '$' in the projection.

Generated at Thu Feb 08 05:21:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.