[SERVER-68481] Test the performance of CE Created: 02/Aug/22  Updated: 29/Oct/23  Resolved: 22/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Timour Katchaounov Assignee: Anton Korshunov
Resolution: Fixed Votes: 0
Labels: M5
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: QO 2022-10-03, QE 2022-10-17, QO 2022-11-28, QO 2022-12-12, QO 2022-12-26
Participants:

 Description   

Performance testing of cardinality estimation.

  1. Implement a performance testing framework -  the common logic to be used by all CE performance tests.
  2. Pre-generate histograms (either from data or directly) with varying number of buckets, and few different bucket boundary sizes.
  3. Measure the estimation time for all SARGable predicates, and all dataflow nodes both against histograms, and via heuristic estimation.
  4. Measure estimation time for complex AND/OR expressions - grow the expression size.

If possible, consider to use these tests to guard against performance refressions. As a minimum these tests should result in a performance report at the end of the project. The tests should be implemented either as C++ unit tests or Google perf tests.

 

This task should be developed in sync with the following two efforts from the benchmark team: SERVER-68029, SERVER-70646.



 Comments   
Comment by Githook User [ 22/Dec/22 ]

Author:

{'name': 'Anton Korshunov', 'email': 'anton.korshunov@mongodb.com', 'username': 'antkorsh'}

Message: SERVER-68481 Test the performance of CE
Branch: master
https://github.com/mongodb/mongo/commit/ad7518a025d666ddaab7c6ec832cc9080119e221

Comment by Timour Katchaounov [ 08/Nov/22 ]

svilen.mihaylov@mongodb.com thanks for the suggestion. What I meant above is that given a set of tags (ints) without the nodes themselves, this is how we can map tags to ABT types. Of course, if such a mapping already exists, we should reuse that. Is there such a mapping from ABT tags to type names?

Comment by Timour Katchaounov [ 08/Nov/22 ]

Proposal for a low-level design.

Measure the time to do the below operations for the same query:

  • Optimize a query.
    • Measure by wrapping OptPhaseManager::optimize, which is called from CETester::getCE.
    • Measure a number of times (e.g. 100).
    • Compute the avg, min, max times to optimize.
  • Total CE time.
    • Can be done by wrapping each implementation of CEInterface::deriveCE in a timer.
    • Call optimize many times.
    • Sum the time for all calls to deriveCE throughout one single call to optimize().
    • Compute the average, min, max times for deriveCE per one optimize() call.
  • Individual times to optimize each type of node (or each individual node, and type):
    • Done by wrapping each individual transport with a timer.
    • IMO it can also be done just by wrapping deriveCE(), but separating its exec time by the type of input node. This will be much simpler.
    • Nodes can be identified via: Polyvalue::hash(), or their Memo group id: node.getGroupId()
    • Node type can fetched via ABT::tagOf() - it is just a number. In the end it can be mapped via testing what type tag it matches - like: (node.tagOf() == node.tagOf<SargableNode>()). Need to check if there is a table that maps tags to node type names.
    • The measurements can be done in two ways:
      • Call optimize() many times for the same query, collect the time to estimate each node (type), and the number of times that node type was estimated.
      • Call optimize() once, but run each estimation transport() many times.

The above means that there should be a different counter for each mesured type of functionality.
To avoid measuring the overhead of measurement itself, each of the above measurements should be done alone, without measuring any other type of functionality.

Generated at Thu Feb 08 06:10:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.