Default analyzeShardKeyNumRanges = 1 when analyzeShardKey is run

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Cluster Scalability
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Consuming information about numReadsByRanges and numWritesByRanges is only valuable when a user can double click into which ranges are read or write heavy. Without providing such a functionality, the reads/writes by ranges output often raises more questions than answers. It can also block the output of percentage reads and writes that analyzeShardKey with readWriteDistribution: true provides due to $sample issues. 

      when analyzeShardKeyNumRanges is 1, we just skip doing getNext() on $sample aggregation. So, we should default it to 1. 

      If users would like to see the output, they can use the setParameter to set it to 100. 

       

      More context for posterity:

      When calculating the keyCharacteristics metrics, the sampled documents are used to calculate the cardinality, frequency and monotonicity. For that, we do want higher precision, which is why the calculation requires sampling large number of documents (sampleRate defaults to 1 and sampleSize defaults to 1 million) and we purposely use an index scan instead of $sample even though performance testing shows the cost of a index scan is non-trivial for large collections.
      When calculating the readWriteDistribution metrics, the sampled documents are only used the define the chunk boundaries so rough estimate is acceptable. The use of $sample is more tied to how it is the approach used by resharding to find chunk boundaries. Also, we expected $sample to return representative random samples when the shard key has good cardinality and frequency. We later discovered that the assumption didn't always hold due to the randomness issue in $sample 

              Assignee:
              Unassigned
              Reporter:
              Ratika Gandhi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: