Extend cost model script schema generation code to include sample values

XMLWordPrintableJSON

    • Query Optimization
    • Fully Compatible
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The cost model scripts can currently print out the field name and field type for each field in the collection that is generated.
      https://github.com/10gen/mongo/blob/master/buildscripts/cost_model/ce_generate_data.py#L209

      We would like to extend schema generation as follows:
      1. We would also like to return a sample of unique values that were generated for the fields.
      2. We would like to return, for any given field, the min and max value that was generated. If there is a mix of values in a field, then the min and max of each type should be returned.
      3. Possibly return other information about the distribution of the values in the field that would be useful.
      4. In the existing cost model scripts, if there is a mix of types in a field, then the field type for the field is just "mixdata". We would like to return an array of the actual types present in the field. https://github.com/10gen/mongo/blob/59f5670aa7d380ea4f02833c4d2642142d3981c2/buildscripts/cost_model/random_generator.py#L60

            Assignee:
            Andi Wang
            Reporter:
            Andi Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: