[SERVER-29425] Add an expression to sort an array Created: 02/Jun/17 Updated: 08/Jan/24 Resolved: 19/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Neil Shweky (Inactive) |
| Resolution: | Fixed | Votes: | 5 |
| Labels: | expression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Sprint: | QE 2021-11-15 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
Syntax:
Examples it should work on:
|
| Comments |
| Comment by Githook User [ 06/Dec/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 24/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Billy Donahue [ 16/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Hi, the latest commit here, ca9bb0300c804617e936c2e2516b441a9474e355, I'm on a Linux virtual workstation building with:
I can't say I understand why this doesn't break in production, but I believe my local build should be a supported scons invocation. The undefined references are because pattern_cmp.h is using FieldRef in its inline functions, but files that include pattern_cmp.h don't depend on the library that provides the implementation of the FieldRef functions it needs. There's another problem with the pattern_cmp.h, which is the static function checkSortClause. This function will have a distinct and separate definition in each .cpp file that includes pattern_cmp.h. It is called in two places: db/pipeline/expression.cpp and db/update/push_node.cpp. These will each refer to their own private copy of the function. There's no need for this duplication, and it's dangerous because any inline function in a header that refers to checkSortClause will create a One Definition Rule violation as soon as that header is included by 2 .cpp files. I don't see a reason for checkSortClause to be defined in a header. It's not a template or anything, so it can move into a .cpp file and avoid all this trouble. | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Andrii Dobroshynski (Inactive) [ 04/Nov/21 ] | ||||||||||||||||||||||||||||||||||||
|
Code review URL: https://github.com/10gen/mongo/pull/1634 | ||||||||||||||||||||||||||||||||||||
| Comment by Errol Kutan [ 20/Apr/21 ] | ||||||||||||||||||||||||||||||||||||
|
This would be very nice to have. There are a handful of particular use-cases. A general example would be if you are using the aggregation pipeline to mimic the functionality of the traditional update operators, but with improvements to overcome shortcomings. A concrete example of this is the $push : { $each : , $sort : , $slice } functionality, which will not allow you to treat the resulting array into which you are pushing as a set. The use-case driving this doesn't seem rare or niche–in this case, you have a schema that embeds the top X of any related data entity (could be top X comments, most recent X comments, orders, etc) and that data may come off a queue in an order that is not consistent with the order in which they are maintained in the array (eg–you want most recent X purchases but they could possibly arrive in a random order from the message queuing system. Moreover, you want to be able to deduplicate retransmitted messages from the queue.
In this case, the standard $push : { $each : , $sort : , $slice } operators do not work, so you must resort to the aggregation pipeline. However, you cannot use the $unwind stage (and subsequent $sort, then $group) when using the agg pipeline as an update operator, so you need an alternative way of sorting. Currently, you need to create a custom function, so the explicit sort expression is not NEEDED, but it feels silly to require a custom javascript expression for something that is a trivial operation and probably commonly needed, if customers are using MDBQL and MongoDB data modelling correctly . | ||||||||||||||||||||||||||||||||||||
| Comment by Pawel Terlecki [ 01/Mar/20 ] | ||||||||||||||||||||||||||||||||||||
|
Since this expression is no longer necessary for finding index inconsistencies across shards, we decided to ship it later with all the requested functionality. I am discarding the code review. | ||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 11/Feb/20 ] | ||||||||||||||||||||||||||||||||||||
|
The sharding team would like this to be added to our quick win list. It would help with writing a pipeline that can be used with $indexStats to find indexes that are inconsistent across shards. | ||||||||||||||||||||||||||||||||||||
| Comment by Sagar [ 27/Sep/19 ] | ||||||||||||||||||||||||||||||||||||
|
It happens to be similar to the slower solution you posted but adding my solution here in case someone is interested. More details can be found here https://stackoverflow.com/questions/58065037/mongodb-safely-sort-inner-array-after-group
| ||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 27/Sep/19 ] | ||||||||||||||||||||||||||||||||||||
|
Set union does guarantee order by current implementation (a set is achieved via maintaining a sorted set, which is how it can quickly determine duplicates). Now, even if that may not strictly be guaranteed in some arbitrary future implementation, hopefully we will add a proper sort array expression before then. There is a slower way to do it I show here: https://github.com/asya999/bits-n-pieces/blob/master/scripts/sortArray.js but I'd recommend using $setUnion instead. | ||||||||||||||||||||||||||||||||||||
| Comment by Sagar [ 27/Sep/19 ] | ||||||||||||||||||||||||||||||||||||
|
@Asya Sorry, could you explain how it is working ? why do we need a set union ? If I remove set union it doesn't work. As far as I know from the docs set union doesn't guarantee any ordering. Did you mean to use $lt or something else? | ||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 19/Aug/19 ] | ||||||||||||||||||||||||||||||||||||
|
FYI, a workaround is to use existing aggregation expressions to sort the array like this for first example:
and like this for second example:
| ||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 02/Jun/17 ] | ||||||||||||||||||||||||||||||||||||
|
I think it depends on the syntax we specify. We can say the sortBy field is relative to array (so it must be a path name). Sorting "a.0" would only make sense if a is an array of arrays (and it's ambiguous since it could also be an array of objects where field named "0" exists and should be used as sort key). | ||||||||||||||||||||||||||||||||||||
| Comment by Kyle Suarez [ 02/Jun/17 ] | ||||||||||||||||||||||||||||||||||||
|
What happens if I sort by "a.0"? Does it sort by the first element of the nested arrays within a? Or does it sort according to the field whose name is "0"? |