-
Type: Task
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
QO 2025-02-03
Estimation of $exists doesn't take into account the count of missing values and can produce incorrect results.
In the following test case there is a filed with 90% of the values missing:
const collName = 'wrong_exists_estimate'; const coll = db[collName]; coll.drop(); coll.insertMany(Array.from({length: 500}, (_, i) => { const doc = {a: 1}; if (i % 9 === 0) { doc.missing_90_percent = i; } return doc; })); coll.insertMany(Array.from({length: 500}, (_, i) => { const doc = {a: i}; if (i % 9 === 0) { doc.missing_90_percent = i % 3; } return doc; })); coll.createIndexes([{missing_90_percent: 1}]); coll.runCommand({analyze: collName, key: "missing_90_percent", numberBuckets: 10}); db.adminCommand({setParameter: 1, planRankerMode: "histogramCE"}); coll.find({ "missing_90_percent" : { "$exists" : false } }).explain(); coll.find({ "missing_90_percent" : { "$exists" : true } }).explain();
Index scan of the first query is estimated as 888 documents, while index scan of the second query is estimated as 1000 (the whole collection). The second result is clearly wrong. The reason is that the bounds for the second query are
indexBounds: { missing_90_percent: [ '[MinKey, MaxKey]' ] }
These bounds are estimated as fetching the whole collection but this is incorrect because they should take into account the number of missing values.
- duplicates
-
SERVER-99029 Histogram CE for AND/OR trees of all sargable ME nodes
- Closed