[SERVER-7528] Order of responses to a MongoDB $in query Created: 01/Nov/12  Updated: 06/Apr/23

Status: Open
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: New Feature Priority: Trivial - P5
Reporter: Jonathan Richard Ong Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 25
Labels: query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-22193 Allow returning documents in the same... Closed
is duplicated by SERVER-20788 Ordering results from $in query -- ca... Closed
Related
is related to SERVER-32947 Does mongodb $lookup operator ruins d... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Participants:

 Description   

I have a use-case where I am querying an array of ObjectIDs. For example:

db.things.find({
_id:

{ $in: [ ObjectID(...), ObjectID(...), ObjectID(...), ObjectID(...) ] }

})

I'd like the results to be ordered by `_id.$in`. Right now I'm resorting to client-side sorting, which kind of sucks since it requires converting between ObjectIDs and strings among other hacks.

A special sorting option like this would be awesome (similar to $natural):

db.things.find({}).sort({
$in:

{ '_ids' }

})

Although the attribute being `$in`ed need not be `_id`, it should be unique (and thus indexed).

Some related questions on StackOverflow:

http://stackoverflow.com/questions/3142260/order-of-responses-to-mongodb-in-query
http://stackoverflow.com/questions/11839515/comparing-and-sorting-mongodb-objectids-in-node-convert-to-string



 Comments   
Comment by Slava Fomin [ 16/Dec/21 ]

I also has to resort to client-side sorting, however, it would be much better if I could receive the pre-sorted documents like I did in PostgreSQL.

Comment by NOVALUE Mitar [ 12/Nov/20 ]

Also, using $indexOfArray approach is O(n^2), isn't it? Which becomes prohibitive with large arrays of IDs as well.

Comment by NOVALUE Mitar [ 12/Nov/20 ]

I again hit this issue when I wanted to use ElasticSearch for full-text search, which then returns 100k result IDs, I want then to query MongoDB using such long list of result IDs, getting them batch by batch in same order from the database, potentially filtering it further. I think it is problematic that a) I have to include the whole list in the query itself (no prepared statements like SQL) b) the order does not match. I have to pass whole 100k IDs in because if further filter, I want each batch/page to be the same size, not that I first pass 100 by 100 IDs into the query, just to get back a subset of those, having then batch/page of uneven sizes.

Comment by bin zhang [ 28/Jun/20 ]

Any plan to implement this feature? It's useful to keep the original order in case the original input ids are already sorted.

db.things.aggregate({$match:{_id:{$in:order}}}

 

Comment by Asya Kamsky [ 05/Dec/17 ]

The direction we are heading is bringing the "normal" query and aggregation closer together, so I'm not sure the distinction between "normal" and not normal is as big as it used to be...

Comment by NOVALUE Mitar [ 04/Dec/17 ]

I mean, the issue is where we draw a line of what all we have to do in aggregation pipelines, and what we can do as a normal query. While it is great to see that it is possible to do this in aggregation, it would be much cooler if a simple query could be sorted by an array, without having to do aggregation.

Comment by Asya Kamsky [ 04/Dec/17 ]

Given there is a way to get results in the order specified in $in as of 3.4 is there a reason to keep this ticket open?

In other words, are there scenarios that would not be satisfied with aggregation sorting of resulting documents?

Comment by Asya Kamsky [ 24/Oct/16 ]

In upcoming 3.4 this is possible to do with an aggregation pipeline.

I write it up here: http://www.kamsky.org/stupid-tricks-with-mongodb/using-34-aggregation-to-return-documents-in-same-order-as-in-expression

Basically, it's a matter of adding a field with order of matched field in the array that specified sort order.

Using initial example from this ticket:

order=[ ObjectID(...), ObjectID(...), ObjectID(...), ObjectID(...) ];
db.things.aggregate({$match:{_id:{$in:order}}}, {$addFields:{__order: { $indexOfArray : [ order, "$name" ]}}}, {$sort:{ "__order":1}});

Comment by Ali Rahbari [ 27/Aug/16 ]

This is really a necessary feature. For example in a system where user can feature items, they must see them in the order they featured them. In SQL days this was done using a join so you wouldn't actually care about in. I believe Mongodb must preserve the $in order unless an order by clause is present. Doing this client side would be so frustrating considering we must do the sort as well as paging!

Comment by Dissatisfied Former User [ 20/Aug/16 ]

I, too, am looking forward to this. Sure, I could batch retrieve the document set and assign to a dictionary (in Python), then re-iterate the value I passed to $in, but this is a poor solution for anything but trivial numbers of results. A streaming solution (since I typically use generators for my $in values, too) is greatly superior.

Due to the existing sorting syntax, I'd recommend the following form:

{'$in': 1}

With potentially mixed usage:

{age: -1, '$in': 1}

This would imply a limitation of only one $in filter to satisfy the query in this way, similar to the current $ limitation. This is acceptable to me, possibly not to others, but importantly, does conform to the existing syntax.

Edited to add: explicit lookup of documents by ID in the pre-arranged order is a specialization of the general form I suggest. Because it's so specialized (single, automatic index, no other possible ordering) it may be worthwhile to implement as a distinct find command. findSpecific or findById or similar. Also likely easier to implement as such.

Comment by Vivek Gounder [ 19/Aug/15 ]

Doing it in the client makes it very odd and strange indeed. It would be lovely if we had the server side implementation for the clients to have a sorting option while using the in query.

Comment by Jonathan Richard Ong [ 30/May/14 ]

What's the recommended way to do this now?

Comment by Asya Kamsky [ 29/May/14 ]

The $or work-around hack no longer works starting with 2.6.x - it was a side effect of implementation which has changed.

Comment by Piyush Verma [ 02/Nov/12 ]

Ah I get it, where i was going wrong.
The Order of Insertion is what the order of output document is while using aggregate. Whereas when you do find the order is _id. Never mind.
Ignore the earlier comments.

Comment by Piyush Verma [ 02/Nov/12 ]

Hey Aaron,
try this.

import pymongo
import random
 
array = range(1, 20)
random.shuffle(array)
 
con = pymongo.Connection(safe=True)
con.drop_database('test')
for i in array:
    con.test.test.insert({'_id': i})
 
test = con.test.test
 
_map = {}
for x in test.find({'_id': {'$in': array}}):
    _map[x['_id']] = x
 
normal = [_map[i] for i in array]
 
aggregate = test.aggregate({
    '$match':{'_id': {'$in': array}},
    '$project': {'_id': 1}
    })
 
print normal
print aggregate.get('result')

Comment by Aaron Staple [ 02/Nov/12 ]

Hi Piyush,

Can you send a full test script as I have above?

Comment by Piyush Verma [ 02/Nov/12 ]

To my surprise even this seems to work in pymongo:

c.aggregate(
{'$match':{'_id': {'$in': array}},
{'$project': {'_id': 1}}
)

Odd ?

Comment by Piyush Verma [ 02/Nov/12 ]

Solution 1 and 3 are non Generic. I will have to Embed the calculated ranks rather than seek Array Index. Since there is no way to provide a lazy function, which can be evaluated later.

Solution 2: Yeah it does the Job, although i doubt its THE optimal way to achieve it.

Comment by Aaron Staple [ 02/Nov/12 ]

Here are a few different ways to do this with the aggregation framework. Might become simpler as we add additional aggregation operators:

c = db.c;
c.drop();
 
c.save( { _id:'a' } );
c.save( { _id:'c' } );
c.save( { _id:'b' } );
 
printjson( c.aggregate( { $match:{ _id:{ $in:[ 'b', 'c', 'a' ] } } },
                        { $project:{ _id:1, rank:{ $cond:[ { $eq:[ '$_id', 'b' ] }, 0,
                                                 { $cond:[ { $eq:[ '$_id', 'c' ] }, 1,
                                                 { $cond:[ { $eq:[ '$_id', 'a' ] }, 2, null ] } ] } ] } } },
                        { $sort:{ rank:1 } },
                        { $project:{ _id:1 } }
) );
 
printjson( c.aggregate( { $match:{ _id:{ $in:[ 'b', 'c', 'a' ] } } },
                        { $group:{ _id:null, id:{ $push:'$_id' }, ordered:{ $first:{ $const:[ 'b', 'c', 'a' ] } } } },
                        { $unwind:'$ordered' },
                        { $unwind:'$id' },
                        { $project:{ _id:'$id', matches:{ $eq:[ '$ordered', '$id' ] } } },
                        { $match:{ matches:true } },
                        { $project:{ _id:1 } }
) );
 
printjson( c.aggregate( { $match:{ _id:{ $in:[ 'b', 'c', 'a' ] } } },
                        { $project:{ _id:1, ranks:{ $const:[ { k:'b', v:0 }, { k:'c', v:1 }, { k:'a', v:2 } ] } } },
                        { $unwind:'$ranks' },
                        { $project:{ _id:1, rank:'$ranks.v', matches:{ $eq:[ '$_id', '$ranks.k' ] } } },
                        { $match:{ matches:true } },
                        { $sort:{ rank:1 } },
                        { $project:{ _id:1 } }
) );

Comment by Piyush Verma [ 01/Nov/12 ]

Would really help to have this functionality. +1

Comment by Jonathan Richard Ong [ 01/Nov/12 ]

Thanks for that. Using $or is a much better hack. I generally have a limited number of ids so I don't think scaling would be a problem, assuming 1000 is not large.

Most of all, thanks for the quick reply!

Comment by Aaron Staple [ 01/Nov/12 ]

Alternatively, if you want to use the aggregation framework, you could create a custom field based on the position of an id in your array and then sort by that field.

Comment by Aaron Staple [ 01/Nov/12 ]

Hi Jonathan,

As a workaround for now, if you instead make your query

db.things.find( { $or:[

{ _id:ObjectId( … ) }

,

{ _id:ObjectId( … ) }

… ] } )

all existing versions of mongo will give you the order you want assuming you have an index on _id. This behavior may not be guaranteed in future versions of mongo, so it might be considered a bit of a hack. Also, keep in mind that $or will not scale as well for large numbers of ids in your list as $in would. Here is a test example:

c = db.c;
c.drop();
 
c.save( { _id:0 } );
c.save( { _id:2 } );
c.save( { _id:1 } );
 
printjson( c.find( { _id:{ $in:[ 1, 2, 0 ] } } ).toArray() );
printjson( c.find( { $or:[ { _id:1 }, { _id:2 }, { _id:0 } ] } ).toArray() );

Generated at Thu Feb 08 03:14:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.