[SERVER-29425] Add an expression to sort an array Created: 02/Jun/17  Updated: 08/Jan/24  Resolved: 19/Jan/22

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 5.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Neil Shweky (Inactive)
Resolution: Fixed Votes: 5
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-46372 Use unordered object comparison in ag... Closed
Documented
Duplicate
is duplicated by SERVER-42784 Allow sorting arrays as an aggregatio... Closed
Related
is related to SERVER-60967 Implement $sortArray in SBE Closed
Backwards Compatibility: Fully Compatible
Sprint: QE 2021-11-15
Participants:
Case:

 Description   

Syntax:

{$project:{orderedArray:{$sort:{path:<array-returning-expression>[, sortBy:<someFieldInArray>, <ascending-vs-descending-option>]}}

Examples it should work on:

Input: {a: [ 1, 10, 5, 3 ]}
{newA:{$sort:{path:"$a"}}}
returns:  { newA: [ 1, 3, 5, 10 ] }
 
Input: { a: [ { x:1, y:99 }, { x: 9, y: 4 } ] }
{newA: {$sort:{path:"$a", sortBy: "$a.y" }}}
returns: { newA: [ { x: 9, y: 4 }, { x:1, y:99 } ] }



 Comments   
Comment by Githook User [ 06/Dec/21 ]

Author:

{'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}

Message: SERVER-29425 $sortArray for classic engine: fix collator for whole values
Branch: master
https://github.com/mongodb/mongo/commit/7c94bb89603010e1fe83e0e72369df2e802324b8

Comment by Githook User [ 24/Nov/21 ]

Author:

{'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}

Message: SERVER-29425 $sortArray for classic engine: fix collator for whole values
Branch: master
https://github.com/mongodb/mongo/commit/16766b22cc5d80732543808c9f8af325a8b71080

Comment by Githook User [ 16/Nov/21 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-29425 move pattern_cmp into a .cpp file
Branch: master
https://github.com/mongodb/mongo/commit/3e90750648f198504586b71000e8e0b526516392

Comment by Billy Donahue [ 16/Nov/21 ]

Hi, the latest commit here, ca9bb0300c804617e936c2e2516b441a9474e355,
particularly the src/mongo/db/update/pattern_cmp.h header, causes a build break for me.

I'm on a Linux virtual workstation building with:

python3 buildscripts/scons.py \
    VARIANT_DIR=dynamic_gcc \
    MONGO_GIT_HASH=unknown \
    LLVM_SYMBOLIZER=/opt/mongodbtoolchain/v3/bin/llvm-symbolizer \
    CCACHE=ccache \
    ICECC=icecc \
    VERBOSE=1 \
    --opt=off \
    --dbg=on \
    --ssl \
    --ninja \
    --link-model=dynamic \
    --variables-files=etc/scons/mongodbtoolchain_v3_gcc.vars \
    --modules= \
    generate-ninja
 
ninja build/dynamic_gcc/install/bin/transport_test
 
...
 
[304/523] Linking build/dynamic_gcc/mongo/db/catalog/libcollection.so
FAILED: build/dynamic_gcc/mongo/db/catalog/libcollection.so
export PATH='/opt/mongodbtoolchain/v3/bin:/usr/local/bin:/opt/bin:/bin:/usr/bin';export CCACHE_NOCPP2='1';export ICECC_CARET_WORKAROUND='0';export CCACHE_PREFIX='/home/ubuntu/mongo-dev/mongo/build/scons/icecream/run-icecc.sh';/usr/bin/icerun /opt/mongodbtoolchain/v3/bin/g++ @build/dynamic_gcc/mongo/db/catalog/libcollection.so.rsp
src/mongo/db/update/pattern_cmp.h:67: error: undefined reference to 'mongo::FieldRef::FieldRef(mongo::StringData)'
src/mongo/db/update/pattern_cmp.h:73: error: undefined reference to 'mongo::FieldRef::getPart(unsigned char) const'
src/mongo/db/update/pattern_cmp.h:77: error: undefined reference to 'mongo::FieldRef::dottedField(unsigned char) const'
collect2: error: ld returned 1 exit status
[307/523] Linking build/dynamic_gcc/mongo/db/libop_observer.so
FAILED: build/dynamic_gcc/mongo/db/libop_observer.so
export PATH='/opt/mongodbtoolchain/v3/bin:/usr/local/bin:/opt/bin:/bin:/usr/bin';export CCACHE_NOCPP2='1';export ICECC_CARET_WORKAROUND='0';export CCACHE_PREFIX='/home/ubuntu/mongo-dev/mongo/build/scons/icecream/run-icecc.sh';/usr/bin/icerun /opt/mongodbtoolchain/v3/bin/g++ @build/dynamic_gcc/mongo/db/libop_observer.so.rsp
src/mongo/db/update/pattern_cmp.h:67: error: undefined reference to 'mongo::FieldRef::FieldRef(mongo::StringData)'
src/mongo/db/update/pattern_cmp.h:73: error: undefined reference to 'mongo::FieldRef::getPart(unsigned char) const'
src/mongo/db/update/pattern_cmp.h:77: error: undefined reference to 'mongo::FieldRef::dottedField(unsigned char) const'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

I can't say I understand why this doesn't break in production, but I believe my local build should be a supported scons invocation.
Maybe we don't have dynamically linked builds with `--opt=off` in evergreen? I don't know.

The undefined references are because pattern_cmp.h is using FieldRef in its inline functions, but files that include pattern_cmp.h don't depend on the library that provides the implementation of the FieldRef functions it needs. There's another problem with the pattern_cmp.h, which is the static function checkSortClause. This function will have a distinct and separate definition in each .cpp file that includes pattern_cmp.h. It is called in two places: db/pipeline/expression.cpp and db/update/push_node.cpp. These will each refer to their own private copy of the function. There's no need for this duplication, and it's dangerous because any inline function in a header that refers to checkSortClause will create a One Definition Rule violation as soon as that header is included by 2 .cpp files. I don't see a reason for checkSortClause to be defined in a header. It's not a template or anything, so it can move into a .cpp file and avoid all this trouble.

Comment by Githook User [ 15/Nov/21 ]

Author:

{'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}

Message: SERVER-29425 implement $sortArray in classic engine
Branch: master
https://github.com/mongodb/mongo/commit/ca9bb0300c804617e936c2e2516b441a9474e355

Comment by Githook User [ 15/Nov/21 ]

Author:

{'name': 'Neil Shweky', 'email': 'neilshweky@gmail.com', 'username': 'Neilshweky'}

Message: SERVER-29425 add sortArray visit functions
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/305d5e0cd2828454f40a32a4bb624d54ad1e1c2b

Comment by Andrii Dobroshynski (Inactive) [ 04/Nov/21 ]

Code review URL: https://github.com/10gen/mongo/pull/1634

Comment by Errol Kutan [ 20/Apr/21 ]

This would be very nice to have. There are a handful of particular use-cases. A general example would be if you are using the aggregation pipeline to mimic the functionality of the traditional update operators, but with improvements to overcome shortcomings.

A concrete example of this is the $push : { $each : , $sort : , $slice } functionality, which will not allow you to treat the resulting array into which you are pushing as a set.

The use-case driving this doesn't seem rare or niche–in this case, you have a schema that embeds the top X of any related data entity (could be top X comments, most recent X comments, orders, etc) and that data may come off a queue in an order that is not consistent with the order in which they are maintained in the array (eg–you want most recent X purchases but they could possibly arrive in a random order from the message queuing system. Moreover, you want to be able to deduplicate retransmitted messages from the queue.

 

In this case, the standard $push : { $each : , $sort : , $slice } operators do not work, so you must resort to the aggregation pipeline. However, you cannot use the $unwind stage (and subsequent $sort, then $group) when using the agg pipeline as an update operator, so you need an alternative way of sorting. Currently, you need to create a custom function, so the explicit sort expression is not NEEDED, but it feels silly to require a custom javascript expression for something that is a trivial operation and probably commonly needed, if customers are using MDBQL and MongoDB data modelling correctly . 

Comment by Pawel Terlecki [ 01/Mar/20 ]

Since this expression is no longer necessary for finding index inconsistencies across shards, we decided to ship it later with all the requested functionality. I am discarding the code review.

Comment by David Storch [ 11/Feb/20 ]

The sharding team would like this to be added to our quick win list. It would help with writing a pipeline that can be used with $indexStats to find indexes that are inconsistent across shards.

CC jack.mulrow mira.carey@mongodb.com

Comment by Sagar [ 27/Sep/19 ]

It happens to be similar to the slower solution you posted but adding my solution here in case someone is interested. More details can be found here

https://stackoverflow.com/questions/58065037/mongodb-safely-sort-inner-array-after-group

{"$project":{"sorted_a":{"$reduce":{
  "input":"$a",
  "initialValue":[],
  "in":{"$let":{
    "vars":{"othis":"$$this","ovalue":"$$value"},
    "in":{"$let":{
      "vars":{
        //return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
        "index":{"$cond":{
          "if":{"$eq":["$$ovalue",[]]},
          "then":0,
          "else":{"$reduce":{
            "input":"$$ovalue",
            "initialValue":0,
            "in":{"$cond":{
              "if":{"$lt":["$$othis","$$this"]},
              "then":{"$add":["$$value",1]},
              "else":"$$value"}}}}
        }}
      },
      //insert the current value at the found index
      "in":{"$concatArrays":[
          {"$slice":["$$ovalue","$$index"]},
          ["$$othis"],
          {"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
    }}}}
}}}}
 

Comment by Asya Kamsky [ 27/Sep/19 ]

Set union does guarantee order by current implementation (a set is achieved via maintaining a sorted set, which is how it can quickly determine duplicates).

Now, even if that may not strictly be guaranteed in some arbitrary future implementation, hopefully we will add a proper sort array expression before then.

There is a slower way to do it I show here: https://github.com/asya999/bits-n-pieces/blob/master/scripts/sortArray.js but I'd recommend using $setUnion instead.

Comment by Sagar [ 27/Sep/19 ]

@Asya Sorry, could you explain how it is working ?  why do we need a set union ? If I remove set union it doesn't work. As far as I know from the docs set union doesn't guarantee any ordering. Did you mean to use $lt or something else? 

Comment by Asya Kamsky [ 19/Aug/19 ]

FYI, a workaround is to use existing aggregation expressions to sort the array like this for first example:

{sortedA:{$reduce:{
         input:{$setUnion:"$a"}, 
         initialValue:[], 
         in: {$let:{vars:{o:"$$this"},in: {$concatArrays:[
                  "$$value", 
                  {$filter:{input:"$a",cond:{$eq:["$$o","$$this"]}}}
         ] } }}
}}}
 

and like this for second example:

{sortedA:{$reduce:{
         input:{$setUnion:"$a.y"}, 
         initialValue:[], 
         in: {$let:{vars:{o:"$$this"},in: {$concatArrays:[
                  "$$value", 
                  {$filter:{input:"$a",cond:{$eq:["$$o","$$this.y"]}}}
         ] } }}
}}}
 

Comment by Asya Kamsky [ 02/Jun/17 ]

I think it depends on the syntax we specify. We can say the sortBy field is relative to array (so it must be a path name).

Sorting "a.0" would only make sense if a is an array of arrays (and it's ambiguous since it could also be an array of objects where field named "0" exists and should be used as sort key).

Comment by Kyle Suarez [ 02/Jun/17 ]

What happens if I sort by "a.0"? Does it sort by the first element of the nested arrays within a? Or does it sort according to the field whose name is "0"?

Generated at Thu Feb 08 04:20:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.