|
CE accuracy testing requires a variety of datasets against which to test how accurate are different estimation methods.
Currently there are two candidates to generate random datasets:
- buildscripts/cost_model/data_generator.py
- src/mongo/db/query/ce/rand_utils.cpp, and rand_utils_new.cpp
Both data generation tools cannot be called easily from a JS test script. The goal of this task is to investigate if and how we can use the Python data generator, so that a dataset is generated and stored as a JSON file in a way that allows later a JS test to load that file, and run various queries against it.
It should be straightforward to regenerate those input datasets, or generate new ones. This could be achieved via a script that takes some dataset descriptor(s) and produces a JSON file, which is then stored in a well-known location to be used by the JS test.
This task is about figuring out how do the above, and provide one-two examples. A subsequent task will implement the generation of a variety of datasets.
|