[SERVER-15340] Array Compound Index Changes Sort Behavior Created: 22/Sep/14 Updated: 23/Sep/14 Resolved: 23/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | João França | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL | ||||||
| Steps To Reproduce: | Consider the collection "things" for demonstration purposes, with 3 documents:
The query:
returns the 3 documents in the following order: However, if this index is created
The same query returns the documents in a different order (if index is used): Nevertheless, If we want to always replicate this (second) sort behaviour with or without index, it can be done like this:
|
||||||
| Participants: |
| Description |
|
Creating a compound Index on the fields of embedded documents inside an array can change the sort behaviour if the index is used. This was detected on mongo version 2.6.1. |
| Comments |
| Comment by Ramon Fernandez Marina [ 23/Sep/14 ] | ||||||||||||||
|
The sort order is determined by the bounds of the index used in the query. Consider the following example:
With a query like:
possible sort keys for each document are generated according to the key pattern {"arr.points": -1}:
The sort phase will consider all sort keys where arr.round falls within the bounds of the index being used. In this example, before an index is created the query above needs a collection scan. Conceptually, that means we'll consider the "index bounds" for the query predicate (arr.round) to go from MinKey to MaxKey (i.e.: -infinity to +infinity). This means the sort phase will consider all sort keys where arr.round falls within the bounds – that means all of the sort keys listed above. Since the sort is in decreasing order we want the largest arr.points number, so for document "a" we use (1,6), for document "b" we use (2,7), and for "c" we use (2,8), yielding the order "c", "b", "a" since 8 > 7 > 6:
Now let's create an index and examine the query again:
The same query now does an index scan over {"arr.round": 1, "arr.points": -1} with the bounds [1, 1] for arr.round and [MinKey, MaxKey] for arr.points. This index scan only returns documents where "arr.round" == 1, and returns them in descending arr.points, yielding "a", "b", "c" since 6 > 4 > 2.
Note that the documents are already returned in order due to the shapes of the index and the query, so in this specific case there's no need for an explicit sort stage and no need for generating sort keys. Hope this helps to clarify how things work. Those interested in implementation details will find the code here. Regards, | ||||||||||||||
| Comment by Ramon Fernandez Marina [ 23/Sep/14 ] | ||||||||||||||
|
joaofranca, the sort order is determined by the bounds of the index used in the query. I'm working on a detailed answer that can help you and other users that run into this behavior – stay tuned. | ||||||||||||||
| Comment by João França [ 23/Sep/14 ] | ||||||||||||||
|
My guess is that this happens because of how the indexes are created - the pair (round, points) from the same array element are put together. And when the index is used in the sort operation, it automatically uses the points of the array element matched in the query that also used the index. Can I rely on this behaviour (sort order) with the index? Or should I, for instance, use the aggregation to clearly get the intended results?
(I wanted to avoid using the aggregation framework because the index is only used in the first pipeline $match) | ||||||||||||||
| Comment by Ramon Fernandez Marina [ 22/Sep/14 ] | ||||||||||||||
|
joaofranca, the query:
returns all documents, as all of them match the query (arr.round : 1). The sort() will not sort only the documents according of the value of points found on the first document in arr (in your example, 2000, 1500, and 1000), but on all values of arr.points : 2000, 1000, 1000, 1500, 1500, 3000, 1000, 2000, 2000. If you want to sort within the documents in arr you have to use $elemMatch and the "$" positional operator as you have discovered (see also the FAQ on Indexes). Documents like the one in your example generate multiple sort keys, so the sort order you get is undefined (as in your application should not rely on it). Please see this post for more details on this topic. Regards, | ||||||||||||||
| Comment by João França [ 22/Sep/14 ] | ||||||||||||||
|
The last statement in the "Steps To Reproduce" is incorrect. The query
doesn't work (I guess the positional $ operator doesn't work with sort). Is there any right way to do this and sort by an array element's specific field that matches a given condition (not using the aggregation framework)? |