-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Critical - P2
-
Affects Version/s: 8.0.19, 8.2.6
-
Component/s: None
-
Query Optimization
-
Fully Compatible
-
ALL
-
v8.3, v8.2, v8.0
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
SUMMARY
On MongoDB Server v8.0.0+, queries against clustered collections can return incorrect results when they are executed as backward (descending) bounded clustered scans over the clustered key and the predicate on _id uses mismatched inclusive/exclusive bounds.
ISSUE DESCRIPTION AND IMPACT
The issue occurs exactly for predicates on the _id field of the form {$lt: X}, {$gt: X}, {$gte: A, $lt: B}, {$gt: A, $lte: B}. combined with sort({_id: -1}).
In these cases, queries such as find({_id: {$lt: 3}).sort({_id: -1}) or find({_id: {$gt: 1, $lte: 3}).sort({_id: -1}) and their aggregation equivalents ($match + $sort), can behave as though $lt were $lte and $gt were $gte.
For example,
const db = db.getSiblingDB("test"); db.dropDatabase(); db.createCollection("coll", {clusteredIndex: {key: {_id: 1}, unique: true}}); db.coll.insertMany([{_id: 1}, {_id: 2}, {_id: 3}, {_id: 4}, {_id: 5}]); printjson(db.coll.find({_id: {$lt: 3}}).sort({_id: -1}).toArray());
Expected output:
[{_id: 1}, {_id: 2}]
Actual (incorrect) output:
[{_id: 1}, {_id: 2}, {_id: 3}]
This issue can only manifest if all of the following are true:
The collection being queried is a clustered collection
The query on _id uses either:
a single comparison operator that is either inclusive or exclusive
e.g. {_id: {$lt: A}} or {_id: {$gte: B}}
multiple comparison operators of different types (one inclusive and one exclusive)
e.g. {_id: {$gte: A, $lt: B}} or {_id: {$gt: A, $lte: B}}
The query applies a descending sort on _id, e.g. .sort({_id: -1}) or aggregation $sort stage with {_id: -1}
When these conditions are satisfied, the query may return incorrect results, where the boundary document at either end of the descending result range is wrongly included or excluded.
DIAGNOSIS AND AFFECTED VERSIONS
Users running queries as described above on or prior to MongoDB 8.0.23, 8.2.9, 8.3.2 may have been affected.
To confirm if the issue currently impacts a particular query, users can validate that their query matches all the criteria described above.
REMEDIATION AND WORKAROUNDS
Affected customers may rewrite vulnerable queries to use a forward (ascending) scan and then reorder results on the client side. This workaround has significant tradeoffs:
Pagination correctness
It does not preserve offset-based pagination semantics by itself. Rewriting find(<vulnerable query>).sort({_id: -1}).skip(5).limit(5) to find(<vulnerable query>).sort({_id: 1}).skip(5).limit(5) and reversing client-side will return the first 5 documents in reverse order rather than the expected last 5 documents.
A countDocuments(<filter>)-based strategy may reduce this problem by first calling countDocument(<filter>) and computing an offset from the end but that introduces an extra race condition window where collection modifications may result in skips or repeats of documents.
Performance
There is no server-side impact on performance of flipping the sort order since it is a clustered index scan but having to do the descending sort on the client could result in increased client memory usage since each result set needs to be fully materialized before doing a descending sort. This need to materialize full result sets also effectively disables streaming patterns.
Remediation
This is strictly a query correctness issue and thus, there is no remediation necessary since there is no direct persistent impact on the user’s data.
However, if the user’s application used the query results and then performed a write, then there may have been write loss due to the missing results. There is no remediation possible in this case.
—-----------------------------------------------------
Original description
In some cases, clustered collections incorrectly return results that should be filtered out by the $lt operator. Specifically, this query:
db.clustered_repro.find({ _id: { $lt: 3 } }).sort({ _id: -1 })
May return a document with { _id: 3 }.
The same happens when using an aggregation pipeline. Removing the sort condition, using ascending sort, or changing to $lte, resolves the issue. But I could find no workaround without changing the query or using a plain collection.
NodeJS script reproducing the issue:
import * as mongo from 'mongodb'; /** * Run with: * export MONGO_TEST_URL='mongodb://localhost:27017/clustered_test' * node script.mjs */ const MONGO_URL = process.env.MONGO_TEST_URL ?? 'mongodb://localhost:27017/clustered_test'; const client = new mongo.MongoClient(MONGO_URL); const db = client.db(); const clusteredName = `clustered_repro`; const plainName = `plain_repro`; await client.connect(); // Ensure the client is closed when the script finishes await using _ = { [Symbol.asyncDispose]: async () => await client.close() }; const buildInfo = await db.command({ buildInfo: 1 }); console.log(`MongoDB version: ${buildInfo.version}`); console.log(`Database: ${MONGO_URL}`); // Make sure the collections don't exist before running the test await db .collection(clusteredName) .drop() .catch(() => {}); await db .collection(plainName) .drop() .catch(() => {}); await db.createCollection(clusteredName, { clusteredIndex: { name: '_id', unique: true, key: { _id: 1 } } }); await db.createCollection(plainName); const clustered = db.collection(clusteredName); const plain = db.collection(plainName); const docs = [{ _id: 1 }, { _id: 2 }, { _id: 3 }, { _id: 4 }]; await clustered.insertMany(docs); await plain.insertMany(docs); const plainResults1 = await plain .find( { _id: { $lt: 3 } }, { sort: { _id: -1 } } ) .toArray(); const plainResults2 = await plain .aggregate([ { $match: { _id: { $lt: 3 } } }, { $sort: { _id: -1 } } ]) .toArray(); // This one fails - includes { _id: 3 } const clusteredResults1 = await clustered .find( { _id: { $lt: 3 } }, { sort: { _id: -1 } } ) .toArray(); // This one also fails - includes { _id: 3 } const clusteredResults2 = await clustered .aggregate([ { $match: { _id: { $lt: 3 } } }, { $sort: { _id: -1 } } ]) .toArray(); const clusteredResults3 = await clustered .aggregate([ { $match: { _id: { $lt: 3 } } }, { $sort: { _id: 1 } } ]) .toArray(); const clusteredResults4 = await clustered .find( { _id: { $lte: 2 } }, { sort: { _id: -1 } } ) .toArray(); const clusteredResults5 = await clustered .aggregate([ { $match: { _id: { $lte: 2 } } }, { $sort: { _id: -1 } } ]) .toArray(); console.log('These should all filter out { _id: 3 }'); console.log('Plain 1:', plainResults1, '# find'); console.log('Plain 2:', plainResults2, '# aggregate'); console.log('Clustered 1:', clusteredResults1, '# find'); console.log('Clustered 2:', clusteredResults2, '# aggregate, sort _id: -1'); console.log('Clustered 3:', clusteredResults3, '# aggregate, sort _id: 1'); console.log('Clustered 4:', clusteredResults4, '# find, sort _id: -1, $lte 2'); console.log('Clustered 5:', clusteredResults5, '# aggregate, sort _id: -1, $lte 2');
Tested on 8.0.19 and 8.2.6-rc0.
Results:
MongoDB version: 8.2.6-rc0
Database: mongodb://localhost:27017/clustered_test
These should all filter out { _id: 3 }
Plain 1: [ { _id: 2 }, { _id: 1 } ] # find
Plain 2: [ { _id: 2 }, { _id: 1 } ] # aggregate
Clustered 1: [ { _id: 3 }, { _id: 2 }, { _id: 1 } ] # find
Clustered 2: [ { _id: 3 }, { _id: 2 }, { _id: 1 } ] # aggregate, sort _id: -1
Clustered 3: [ { _id: 1 }, { _id: 2 } ] # aggregate, sort _id: 1
Clustered 4: [ { _id: 2 }, { _id: 1 } ] # find, sort _id: -1, $lte 2
Clustered 5: [ { _id: 2 }, { _id: 1 } ] # aggregate, sort _id: -1, $lte 2
- is related to
-
SERVER-127218 Create a PBT that tests clustered collection correctness
-
- Backlog
-
-
SERVER-75604 Eliminate CollectionScanNode.filter when not needed for clustered collection scans
-
- Closed
-
- related to
-
SERVER-124915 Add clustered_collection_bounded_scan.js to multiversion testing denylist
-
- Closed
-