-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
None
-
Query Optimization
-
Fully Compatible
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The cost model scripts can currently print out the field name and field type for each field in the collection that is generated.
https://github.com/10gen/mongo/blob/master/buildscripts/cost_model/ce_generate_data.py#L209
We would like to extend schema generation as follows:
1. We would also like to return a sample of unique values that were generated for the fields.
2. We would like to return, for any given field, the min and max value that was generated. If there is a mix of values in a field, then the min and max of each type should be returned.
3. Possibly return other information about the distribution of the values in the field that would be useful.
4. In the existing cost model scripts, if there is a mix of types in a field, then the field type for the field is just "mixdata". We would like to return an array of the actual types present in the field. https://github.com/10gen/mongo/blob/59f5670aa7d380ea4f02833c4d2642142d3981c2/buildscripts/cost_model/random_generator.py#L60