[DOCS-10607] Docs for SERVER-19402: Change semantics of sorting by array fields in find and aggregate Created: 31/Jul/17 Updated: 29/Oct/23 Resolved: 16/Oct/17 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.11 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Emily Hall | Assignee: | Jeffrey Allen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Days since reply: | 6 years, 17 weeks, 1 day ago | ||||||||
| Epic Link: | DOCS: 3.6 Server | ||||||||
| Description |
Documentation Request Summary:The "Issue Status" box should contain the information needed to document this change. This is a breaking change and should be documented in the "Compatibility Changes in MongoDB 3.6" page. Engineering Ticket Description:Issue Status as of July 21, 2017 ISSUE SUMMARY In previous versions of MongoDB, sorting by a field containing an array suffered from multiple issues:
For more information, see the Pre-existing Behavior sections below. In versions 3.5.11 and newer, the array sort semantics for find and aggregate have been changed to correct these semantic deficiencies, and to be consistent with one another. The new behavior is to sort documents according to the lowest-valued element of the array for ascending sorts, and according to the highest-valued element of the array for descending sorts. For a more detailed description of this behavior, see the Technical Details section below. USER IMPACT This is a minor breaking change. Applications that currently sort by an array field will experience a different sort order. As a consequence of this change, it is no longer correct for a multikey index to be used to provide a sort on an array field. In such cases, the query plan will instead include a blocking SORT stage. For indexes that have path-level multikey metadata, as described in TECHNICAL DETAILS Arrays are now sorted according to their lowest-valued element. For example, consider sorting the following documents by {timestamps: 1}:
Document 0 has two timestamps: July 21 2017, 3:31 PM UTC and July 21 2017, 1:31 PM UTC. Document 1 also has two timestamps: July 15 2017, 3:31 PM UTC and July 21 2017 5:31 PM UTC. Document 0 will sort according to its earliest time, 1:31 PM on July 21. Similarly, document 1 will sort according to its July 15 datetime. Therefore document 1 will sort before document 0 in both find and aggregate:
For a descending sort, the most recent time in the array will be used as the sort key: 3:31 PM on July 21 for document 0 and 5:31 PM on July 21 for document 1. Since the sort is descending these keys are then ordered from most recent to least recent, resulting in document 1 sorting before document 0:
The semantics defining "lowest-valued array element" are identical to those used for index key generation. To determine the sort key for a document d given sort pattern p, we generate the set of index keys K for d by treating p as an index key pattern. The sort key is min(K), where min is determined by the index ordering. Consider the example of sorting the the following documents by {"customers.id": -1, "customers.code": 1}:
The index keys for document 0 are (1, "X") and (2, "Y"). The smallest key in this set according to the sort pattern is (2, "Y"), since we are sorting by customers.id descending. Therefore, (2, "Y") is the sort key for document 0. By similar logic, (3, "W") is the sort key for document 1. Comparing these keys, (3, "W") sorts before (2, "Y"), meaning that document 1 sorts first. PRE-EXISTING BEHAVIOR FOR FIND Prior to this change, the find command's behavior was similar to the above, except the smallest in-bounds element was selected as the sort key. A key was considered in-bounds based on the query predicate. Consider this example:
The sort key for this document would be 2, because it is the smallest key that falls within the bounds of the query predicate. PRE-EXISTING BEHAVIOR FOR AGGREGATE For aggregation sorts prior to this change, the entire array was used as the sort key. The array sort keys were compared element-wise to determine the sort order of the result set. For instance, when sorting these documents in ascending order by field a:
The sort keys would be [3, 1, 5] and [3, 4, 0] respectively. Document 0 sorts first: although the first array elements are equal, the second array element breaks the tie. Original descriptionWhen the query sort algorithm is given an array value to sort on, an array element is chosen as the sort key by using the query predicate to generate sort bounds (see this comment in sort.cpp for an explanation about why this is the case). Instead, we should consider changing the semantics of sorting on an array value, such that the entire array is used as the sort key. Pros:
Cons:
|
| Comments |
| Comment by Githook User [ 17/Oct/17 ] |
|
Author: {'email': 'jeffrey.allen@10gen.com', 'name': 'jeff-allen-mongo', 'username': 'jeff-allen-mongo'}Message: |
| Comment by Jeffrey Allen [ 16/Oct/17 ] |
|
This is RFM: |
| Comment by Jeffrey Allen [ 06/Oct/17 ] |
|
Code review: https://mongodbcr.appspot.com/161820001/ |