Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cbr_ce_sources

Assigned Teams:

Query Optimization
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If the NDV becomes more than the numberBuckets, the histogram becomes very degenerate. E.g. for numberBuckets = 10 , all the distinct values beyond the first 9 that comprise almost all of the rows in the dataset are put together in the last bucket. This leads to very skewed estimates. The histogram is neither equal-width nor equal-depth and does not seem to minimize the estimation error in any discernible way.

Value N is present in the table N times:

db.foo.drop();
let docs = [];
for (let q = 0; q < 1000; q++) {
    for (let i = 0; i < q; i++) {
        docs.push({a: q});
    }
}
db.foo.insert(docs);
db.foo.runCommand({analyze: "foo", key: "a", numberBuckets: 5});

This produces the following bounds:

bounds: [ 1, 2, 4, 5, 999 ]

All the values ~6 to ~999, present in 498486 documents get a single bucket. The rest of the histogram is occupied by values 1 to 5, present in 15 documents only. So we have the following wildly inaccurate estimates:

Enterprise test> db.foo.find({a:7}).count();
7
Enterprise test> db.foo.find({a:7}).explain().queryPlanner.winningPlan.cardinalityEstimate;
502

related to

SERVER-99093 histogramCE: Estimate for inequality is zero if NDV > numberBuckets

Open

Assignee:: Unassigned
Reporter:: Philip Stoev
Participants:: Philip Stoev
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 21 2025 07:59:53 AM UTC
Updated:: Jun 25 2025 07:04:45 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates