[SERVER-66957] Histogram-based CE for SargableNodes based on the CE prototype Created: 02/Jun/22  Updated: 29/Oct/23  Resolved: 11/Jul/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Timour Katchaounov Assignee: Alya Berciu
Resolution: Fixed Votes: 0
Labels: M1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-68048 Coverity analysis defect 122536: Unsa... Closed
causes SERVER-68049 Coverity analysis defect 122540: Copy... Closed
Backwards Compatibility: Fully Compatible
Sprint: QO 2022-06-13, QO 2022-06-27, QO 2022-07-11
Participants:

 Description   

Implement histogram-based CE for SargableNodes based on the prototype implementation in branch "hist-ce".

The goal of this task is to pick all relevant pieces from the 'hist-ce' branch, and implement basic histogram-based CE as follows:

  • Implement the CEInterface interface in a new class (e.g. HistogramCE) similar to the ones for heuristic- and sampling- based CE.
  • This new class should provide transport methods for the following Nodes: SargableNode, ScanNode. All other nodes should be processed by a catch-all transport that uses heuristic estimation.
  • Figure out how to extract the necessary information from Sargable nodes in order to use a histogram directly to estimate their cardinality.
  • Estimate implicit/explicit Boolean expressions inside a single Sargable node via exponential backoff. Reuse the formulas for exponential backoff from SERVER-67166.
  • Add an optimizer knob to disable/enable histogram-based CE.
  • Implement tests as unit tests. For testing purposes hack around the lack of a statistics module that can provide us histograms. There should be two types of tests:
    • Correctness tests - these tests use hand-crafted histograms and predicates that verify that given some known histogram CE produces a certain (hand-verified) estimate.
    • Accuracy tests - these tests generate some data, create a histogram, and run the estimator. The result is compared either against a hand-verified result, or to a result produced via running a query over the input dataset. The hist-ce prototype has examples how this is done.


 Comments   
Comment by Githook User [ 11/Jul/22 ]

Author:

{'name': 'Svilen Mihaylov', 'email': 'svilen.mihaylov@mongodb.com', 'username': 'smihaylov-mongodb'}

Message: SERVER-66957 Integrate CE prototype

Co-authored-by: Svilen Mihaylov <svilen.mihaylov@mongodb.com>
Co-authored-by: Timour Katchaounov <timour.katchaounov@mongodb.com>
Co-authored-by: Alya Berciu <alya.berciu@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/17bb029fcea70ea9486541a492b09d1828acb064

Comment by Githook User [ 08/Jul/22 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com'}

Message: Revert "SERVER-66957 Integrate CE prototype"

This reverts commit 568fa8a001eb42e07214acf33fab48e9ad31ca98.
Branch: master
https://github.com/mongodb/mongo/commit/5405347f10033e9c776a1d8be5ed189d6695d4bd

Comment by Githook User [ 08/Jul/22 ]

Author:

{'name': 'Svilen Mihaylov', 'email': 'svilen.mihaylov@mongodb.com', 'username': 'smihaylov-mongodb'}

Message: SERVER-66957 Integrate CE prototype
Branch: master
https://github.com/mongodb/mongo/commit/568fa8a001eb42e07214acf33fab48e9ad31ca98

Generated at Thu Feb 08 06:06:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.