-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
ALL
I believe it has been discussed before that if the samplingCE returns an estimate of 0, this is a low-confidence estimate and should probably be ignored.
There is an additional problem however, which is the cases where just one of the sampling attempts matches – in this case, the cardinality is massively overestimated.
For a table with 50K unique values where just 1 value matches the predicate, we run 1K explains:
995 cardinalityEstimate: 0, 5 cardinalityEstimate: 130.2109375,
That is, in 99.5% of the cases, the samplingCE never finds a matching document and reports estimate of 0. The problem is that in the other 0.5% of the cases, sampling has a single hit and from it it extrapolates a cardinalityEstimate of 130.
While this is definitely understandable from a statistics/probability standpoint, in a query optimization context it is dangerous. Such a difference in the estimates may cause the "default" plan to periodically flip to an "alternative" plan and back again. Such instability s a bane for DBAs – it would have been preferable to select the same plan in 100% of the cases, even if it is the suboptimal one.
I believe some sort of protection against this situation is in order – such an estimate must be treated as low confidence and A) not used at all (though the heuristicCE will probably do even worse) or B) the sampling should continue until the confidence increases beyond some threshold.
- depends on
-
WT-14117 Improve random cursor for creating even split points design proposal document
-
- Open
-