Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 9.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Optimization
Backwards Compatibility:
Fully Compatible
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Similar to ~~SERVER-121256~~, the join cost model is currently overestimating the number of random IOs that single table index scans are performing. We currently invoke mackert lohman with docsOutput. This ignores the fact that for a single index key, all index entries are clustered by RID, which perform sort-sparse IO on the collection. We need to instead estimate the NDV of index keys for the scan. This can be done by estimating the NDV for index keys and dividing by the selectivity of the range.

There are two key technical challenges here:

Arbitrary index scans may contain multikey fields. Our current NDV estimator assumes all fields are non-multikey. I think that ~~SERVER-122379~~ should address this challenge.
Neither JoinCostEstimator nor the JoinCardinalityEstimator does not have access to the single table QSNs which we'll need to estimate the NDV of the index keys. This may require some refactoring.

The other thing this ticket should do is invoke the Yao formula to get the number of distinct pages the fetch will read.

After this ticket, we may require a ticket similar to ~~SERVER-122265~~ which accounts for the sorted-sparse IO for single table index scans.

is duplicated by

SERVER-122145 [Join Optimization] Use Yao's formula for estimation of random I/O for single table index scan

Closed

is related to

SERVER-122379 Extend countNDV method to support multikey, filtered cases

Closed

SERVER-121256 [Join Optimization] INLJ costing - don't consider every probed document as a random IO

Closed

SERVER-122265 [Join Optimization] Costing of INLJ ignores cost of sorted-sparse I/O to fetch records from collection

Closed

related to

SERVER-122381 Investigate plan quality/stability for TPC-H Q14

Closed

SERVER-123168 [Join Optimization] Code cleanup: Don't pass SamplingEstimators to JoinCardinalityEstimator

Open

(1 related to)

Assignee:: Ben Shteinfeld
Reporter:: Ben Shteinfeld
Participants:: Ben Shteinfeld, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Mar 27 2026 05:48:54 PM UTC
Updated:: Apr 01 2026 03:37:13 PM UTC
Resolved:: Apr 01 2026 03:37:13 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates