[SERVER-42565] Aggregations and find commands sort missing fields differently Created: 31/Jul/19 Updated: 29/Oct/23 Resolved: 29/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.17, 4.3.1, 4.2.3, 4.0.15 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Claire Childs (Inactive) | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qfz, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2, v4.0, v3.6
|
||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2019-08-26, Query 2019-10-21, Query 2019-11-04 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Description |
|
An aggregation pipeline's $sort and find command's sort treat missing fields differently. A find command evaluates missing as equivalent to null while an aggregation pipeline evaluates missing as equivalent to undefined. As a consequence, find commands and aggregation pipelines do not guarantee the same sort order when at least one of the documents in a collection is missing at least one of the fields being sorted on. It is likely that this behavior arises from the difference in behavior between the fast and slow methods for extracting a sortKey.
|
| Comments |
| Comment by Githook User [ 18/Dec/19 ] | ||||||||||||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: Note that this backport combines the additional testing from 53d3aae5 (cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6) | ||||||||||||
| Comment by Githook User [ 18/Dec/19 ] | ||||||||||||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: Note that this backport combines the additional testing from 53d3aae5 (cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6) | ||||||||||||
| Comment by Githook User [ 18/Dec/19 ] | ||||||||||||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: Note that this backport combines the additional testing from 53d3aae5 (cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6) | ||||||||||||
| Comment by Githook User [ 28/Oct/19 ] | ||||||||||||
|
Author: {'name': 'Justin Seyster', 'username': 'jseyster', 'email': 'justin.seyster@mongodb.com'}Message: The problem described by this ticket was fixed as part of work in an | ||||||||||||
| Comment by Max Hirschhorn [ 31/Jul/19 ] | ||||||||||||
|
This issue further compounds the sort semantics for null, missing, and undefined on sharded collections because "missing" is serialized as null when forwarding the sort key to mongos despite aggregation treating "missing" and undefined (but not null) as equal for sorting. This means that the resulting input streams to mongos are no longer sorted as can be observed in the null -> undefined -> null transitions below. (Note that running --suite=aggregation_sharded_collections_passthrough missing_sort_key.js is an easy way to exercise the merging behavior.)
| ||||||||||||
| Comment by Max Hirschhorn [ 31/Jul/19 ] | ||||||||||||
Just to clarify the behavior a little further - Value::compare() checks whether the canonicalized version of the BSON types are equal before comparing their values. In the extractKeyFast() case, returning "missing" means canonicalizeBSONType() would return 0, whereas in the extractKeyWithArray() case, returning null means canonicalizeBSONType() would return 5. Since we're able to use extractKeyFast() for documents omitting the field, we end up comparing "missing" and undefined as equal in aggregation because they both canonicalize to 0. |