[SERVER-19402] Change semantics of sorting by array fields in find and aggregate Created: 14/Jul/15 Updated: 19/May/22 Resolved: 21/Jul/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.11 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Backlog - Query Team (Inactive) | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2017-06-19, Query 2017-07-10, Query 2017-07-31 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Issue Status as of July 21, 2017 ISSUE SUMMARY
For more information, see the Pre-existing Behavior sections below. In versions 3.5.11 and newer, the array sort semantics for find and aggregate have been changed to correct these semantic deficiencies, and to be consistent with one another. The new behavior is to sort documents according to the lowest-valued element of the array for ascending sorts, and according to the highest-valued element of the array for descending sorts. For a more detailed description of this behavior, see the Technical Details section below. USER IMPACT
TECHNICAL DETAILS
Document 0 has two timestamps: July 21 2017, 3:31 PM UTC and July 21 2017, 1:31 PM UTC. Document 1 also has two timestamps: July 15 2017, 3:31 PM UTC and July 21 2017 5:31 PM UTC. Document 0 will sort according to its earliest time, 1:31 PM on July 21. Similarly, document 1 will sort according to its July 15 datetime. Therefore document 1 will sort before document 0 in both find and aggregate:
For a descending sort, the most recent time in the array will be used as the sort key: 3:31 PM on July 21 for document 0 and 5:31 PM on July 21 for document 1. Since the sort is descending these keys are then ordered from most recent to least recent, resulting in document 1 sorting before document 0:
The semantics defining "lowest-valued array element" are identical to those used for index key generation. To determine the sort key for a document d given sort pattern p, we generate the set of index keys K for d by treating p as an index key pattern. The sort key is min(K), where min is determined by the index ordering. Consider the example of sorting the following documents by {"customers.id": -1, "customers.code": 1}:
The index keys for document 0 are (1, "X") and (2, "Y"). The smallest key in this set according to the sort pattern is (2, "Y"), since we are sorting by customers.id descending. Therefore, (2, "Y") is the sort key for document 0. By similar logic, (3, "W") is the sort key for document 1. Comparing these keys, (3, "W") sorts before (2, "Y"), meaning that document 1 sorts first. PRE-EXISTING BEHAVIOR FOR FIND
The sort key for this document would be 2, because it is the smallest key that falls within the bounds of the query predicate. PRE-EXISTING BEHAVIOR FOR AGGREGATE
The sort keys would be [3, 1, 5] and [3, 4, 0] respectively. Document 0 sorts first: although the first array elements are equal, the second array element breaks the tie. Original descriptionWhen the query sort algorithm is given an array value to sort on, an array element is chosen as the sort key by using the query predicate to generate sort bounds (see this comment in sort.cpp for an explanation about why this is the case). Instead, we should consider changing the semantics of sorting on an array value, such that the entire array is used as the sort key. Pros:
Cons:
|
| Comments |
| Comment by Githook User [ 21/Jul/17 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by Githook User [ 20/Jun/17 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This eliminates the behavior in which the lowest-valued Note that it is no longer correct for a multikey index to |