Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- M4

Assigned Teams:

Query Optimization
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The current NDV estimation interface assumes that we first collect the sample, and then we count the NDVs for each tuple separately. This is simple and flexible, but it will add significant overhead since each pass over the sample is expensive.

This ticket is to improve the performance of NDV computation by either: doing all NDV estimates together in a single pass over the sample or, even more efficient, doing this inside sample collection. This will require collecting all field name tuples for NDV computation up front.

There are several other ideas for performance improvements here:

When counting NDV within the sample, try to reduce the number of passes over a single document (we could try to re-use the SBE expression field path work, but it would require the sample to be persisted)
Refactor the method-of-moments newton-raphson iteration to do the most expensive computation only once per iteration

is related to

SERVER-117085 Avoid NDV computation if we have a unique index on the joining fields

Closed

related to

SERVER-112337 Use sampling infrastructure to compute NDVs of joining fields

Closed

SERVER-112233 Generate an appropriate projection for sampling estimators

Needs Scheduling

Assignee:: Unassigned
Reporter:: Hana Pearlman
Participants:: Hana Pearlman
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 09 2025 11:58:53 AM UTC
Updated:: Jan 13 2026 08:00:58 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates