[SERVER-42565] Aggregations and find commands sort missing fields differently Created: 31/Jul/19  Updated: 29/Oct/23  Resolved: 29/Oct/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 3.6.17, 4.3.1, 4.2.3, 4.0.15

Type: Bug Priority: Major - P3
Reporter: Claire Childs (Inactive) Assignee: Justin Seyster
Resolution: Fixed Votes: 0
Labels: qfz, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.2, v4.0, v3.6
Steps To Reproduce:

python3 buildscripts/resmoke.py --storageEngine wiredTiger --suites aggregation missing_sort_key.js

missing_sort_key.js

'use strict';
db.my_coll.drop();
 
assert.commandWorked(db.my_coll.insert([
    {_id: 1, uniquer: 1, a: undefined},
    {_id: 2, uniquer: 2, a: null},
    {_id: 3, uniquer: 3, a: []},
    {_id: 4, uniquer: 4},
    {_id: 5, uniquer: 5},
    {_id: 6, uniquer: 6, a: []},
    {_id: 7, uniquer: 7, a: null},
    {_id: 8, uniquer: 8, a: undefined},
]));
 
const findCmd = {
    find: "my_coll",
    sort: {a: -1, uniquer: -1},
    projection: {sortKey: {$meta: 'sortKey'}},
};
 
const aggCmd = {
    aggregate: "my_coll",
    pipeline: [{$sort: {a: -1, uniquer: -1}}],
    cursor: {},
};
 
const findRes = db.runCommand(findCmd);
const res1 = new DBCommandCursor(db, findRes).toArray();
const aggRes = db.runCommand(aggCmd);
const res2 = new DBCommandCursor(db, aggRes).toArray();
print('find command results');
print(tojson(res1));
print('agg command results');
print(tojson(res2));

Sprint: Query 2019-08-26, Query 2019-10-21, Query 2019-11-04
Participants:

 Description   

An aggregation pipeline's $sort and find command's sort treat missing fields differently. A find command evaluates missing as equivalent to null while an aggregation pipeline evaluates missing as equivalent to undefined.  As a consequence, find commands and aggregation pipelines do not guarantee the same sort order when at least one of the documents in a collection is missing at least one of the fields being sorted on. It is likely that this behavior arises from the difference in behavior between the fast and slow methods for extracting a sortKey.

@@@ (fast) fast sort key for {_id: "missing 1", uniquer: 4} is [MISSING, 4]
@@@ (fast) slow sort key for {_id: "missing 1", uniquer: 4} is [null, 4]


diff --git a/src/mongo/db/pipeline/document_source_sort.cpp b/src/mongo/db/pipeline/document_source_sort.cpp
index fe32168c17..4d9fde0afe 100644
--- a/src/mongo/db/pipeline/document_source_sort.cpp
+++ b/src/mongo/db/pipeline/document_source_sort.cpp
@@ -27,6 +27,8 @@
  *    it in the license file.
  */
 
+#define MONGO_LOG_DEFAULT_COMPONENT ::mongo::logger::LogComponent::kQuery
+
 #include "mongo/platform/basic.h"
 
 #include "mongo/db/pipeline/document_source_sort.h"
@@ -41,6 +43,7 @@
 #include "mongo/db/query/collation/collation_index_key.h"
 #include "mongo/platform/overflow_arithmetic.h"
 #include "mongo/s/query/document_source_merge_cursors.h"
+#include "mongo/util/log.h"
 
 namespace mongo {
 
@@ -358,6 +361,11 @@ std::pair<Value, Document> DocumentSourceSort::extractSortKey(Document&& doc) co
 
     auto fastKey = extractKeyFast(doc);
     if (fastKey.isOK()) {
+        log() << "@@@ (fast) fast sort key for " << doc << " is " << fastKey.getValue();
+        auto tempInMemorySortKey =
+            deserializeSortKey(_sortExecutor->sortPattern().size(), extractKeyWithArray(doc));
+        log() << "@@@ (fast) slow sort key for " << doc << " is " << tempInMemorySortKey;
+
         inMemorySortKey = std::move(fastKey.getValue());
         if (pExpCtx->needsMerge) {
             serializedSortKey =
@@ -368,9 +376,12 @@ std::pair<Value, Document> DocumentSourceSort::extractSortKey(Document&& doc) co
         // sort key, which is an object with empty field names. We then need to convert this BSON
         // representation into the corresponding array of keys as a Value. BSONObj {'': 1, '': [2,
         // 3]} becomes Value [1, [2, 3]].
+        log() << "@@@ (slow) fast sort key for " << doc
+              << " couldn't be extracted: " << fastKey.getStatus();
         serializedSortKey = extractKeyWithArray(doc);
         inMemorySortKey =
             deserializeSortKey(_sortExecutor->sortPattern().size(), *serializedSortKey);
+        log() << "@@@ (slow) slow sort key for " << doc << " is " << inMemorySortKey;
     }
 
     MutableDocument toBeSorted(std::move(doc));



 Comments   
Comment by Githook User [ 18/Dec/19 ]

Author:

{'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}

Message: SERVER-42565 Agg and find commands sort missing fields differently

Note that this backport combines the additional testing from 53d3aae5
with a one-line fix from 0b80f48b.

(cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6)
Branch: v4.2
https://github.com/mongodb/mongo/commit/6e274136bea6ad0dee5874b2085d941f28e831df

Comment by Githook User [ 18/Dec/19 ]

Author:

{'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}

Message: SERVER-42565 Agg and find commands sort missing fields differently

Note that this backport combines the additional testing from 53d3aae5
with a one-line fix from 0b80f48b.

(cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6)
(cherry picked from commit ae8ddcdb9a15d59aa2dcbea0e95cf6c5e170f9da)
Branch: v4.0
https://github.com/mongodb/mongo/commit/df8de153f6a4d341b8e64990d07f3669eec9f071

Comment by Githook User [ 18/Dec/19 ]

Author:

{'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}

Message: SERVER-42565 Agg and find commands sort missing fields differently

Note that this backport combines the additional testing from 53d3aae5
with a one-line fix from 0b80f48b.

(cherry picked from commit 53d3aae5f8e998e6a6625c9e99da8616640d3ba6)
(cherry picked from commit ae8ddcdb9a15d59aa2dcbea0e95cf6c5e170f9da)
Branch: v3.6
https://github.com/mongodb/mongo/commit/9d74018cd0c058f4fcd5512c3e19d8ca5204306f

Comment by Githook User [ 28/Oct/19 ]

Author:

{'name': 'Justin Seyster', 'username': 'jseyster', 'email': 'justin.seyster@mongodb.com'}

Message: SERVER-42565 Agg and find commands sort missing fields differently

The problem described by this ticket was fixed as part of work in an
earlier commit on master (0b80f48b). This commit adds testing to
verify the fix.
Branch: master
https://github.com/mongodb/mongo/commit/53d3aae5f8e998e6a6625c9e99da8616640d3ba6

Comment by Max Hirschhorn [ 31/Jul/19 ]

This issue further compounds the sort semantics for null, missing, and undefined on sharded collections because "missing" is serialized as null when forwarding the sort key to mongos despite aggregation treating "missing" and undefined (but not null) as equal for sorting. This means that the resulting input streams to mongos are no longer sorted as can be observed in the null -> undefined -> null transitions below. (Note that running --suite=aggregation_sharded_collections_passthrough missing_sort_key.js is an easy way to exercise the merging behavior.)

[ShardedClusterFixture:job0:mongos] 2019-07-31T18:22:37.672-0400 D2 ASIO     [TaskExecutorPool-0] Request 39 finished with response: { cursor: { nextBatch: [
    { _id: "explicit null 2", uniquer: 7.0, a: null, $sortKey: { : null, : 7.0 } },
    { _id: "empty array 2", uniquer: 6.0, a: [], $sortKey: { : undefined, : 6.0 } },
    { _id: "missing 2", uniquer: 5.0, $sortKey: { : null, : 5.0 } },
    { _id: "explicit undefined 1", uniquer: 1.0, a: undefined, $sortKey: { : undefined, : 1.0 } }
 ], id: 0, ns: "test.my_coll" }, ok: 1.0, $configServerState: { opTime: { ts: Timestamp(1564611757, 39), t: 1 } } }
[ShardedClusterFixture:job0:mongos] 2019-07-31T18:22:37.682-0400 D2 ASIO     [TaskExecutorPool-0] Request 40 finished with response: { cursor: { nextBatch: [
    { _id: "explicit null 1", uniquer: 2.0, a: null, $sortKey: { : null, : 2.0 } },
    { _id: "explicit undefined 2", uniquer: 8.0, a: undefined, $sortKey: { : undefined, : 8.0 } },
    { _id: "missing 1", uniquer: 4.0, $sortKey: { : null, : 4.0 } },
    { _id: "empty array 1", uniquer: 3.0, a: [], $sortKey: { : undefined, : 3.0 } }
 ], id: 0, ns: "test.my_coll" }, ok: 1.0, $configServerState: { opTime: { ts: Timestamp(1564611757, 39), t: 1 } } }

Comment by Max Hirschhorn [ 31/Jul/19 ]

It is likely that this behavior arises from the difference in behavior between the fast and slow methods for extracting a sortKey.

Just to clarify the behavior a little further - Value::compare() checks whether the canonicalized version of the BSON types are equal before comparing their values. In the extractKeyFast() case, returning "missing" means canonicalizeBSONType() would return 0, whereas in the extractKeyWithArray() case, returning null means canonicalizeBSONType() would return 5. Since we're able to use extractKeyFast() for documents omitting the field, we end up comparing "missing" and undefined as equal in aggregation because they both canonicalize to 0.

Generated at Thu Feb 08 05:00:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.