[SERVER-85034] Investigate data generation and loading into JS CE accuracy tests Created: 09/Nov/22  Updated: 12/Jan/24  Resolved: 13/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Timour Katchaounov Assignee: Timour Katchaounov
Resolution: Fixed Votes: 0
Labels: M7
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-72036 Implement data generation and loading... Closed
Sprint: QO 2022-11-28, QO 2022-12-12, QO 2022-12-26
Participants:

 Description   

CE accuracy testing requires a variety of datasets against which to test how accurate are different estimation methods.

Currently there are two candidates to generate random datasets:

  • buildscripts/cost_model/data_generator.py
  • src/mongo/db/query/ce/rand_utils.cpp, and rand_utils_new.cpp

Both data generation tools cannot be called easily from a JS test script. The goal of this task is to investigate if and how we can use the Python data generator, so that a dataset is generated and stored as a JSON file in a way that allows later a JS test to load that file, and run various queries against it.

It should be straightforward to regenerate those input datasets, or generate new ones. This could be achieved via a script that takes some dataset descriptor(s) and produces a JSON file, which is then stored in a well-known location to be used by the JS test.

This task is about figuring out how do the above, and provide one-two examples. A subsequent  task will implement the generation of a variety of datasets.



 Comments   
Comment by Timour Katchaounov [ 13/Dec/22 ]

The PR for this investigation ticked has been approved.

It will be merged to master as SERVER-72036.

Generated at Thu Feb 08 06:56:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.