Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-69646

[Optimization] Consider making analyzeShardKey command calculate correlation coefficient in batches

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • Cluster Scalability

    Description

      The check for monotonicity in the analyzeShardKey command currently relies on calculating the correlation coefficient between the WiredTiger RecordIds (i.e. y) in the index store and  1, ..., N (i.e. x) where N is the number of recordIds. Given that each RecordId is 8-byte, for a collection has 10 million unique shard key values, the check would involve storing ~2 * 80MB of x and y in memory. While the check should be fast, the memory usage can still have a non-negligible impact on the server. To this end, we should consider calculating the correlation coefficient in batches of size N' where N' is more manageable. This paper describes a way average correlation coefficients (r), specifically by transforming the r values using a Fisher's z transformation and then taking the average of the z values and converting it back to an r value.

      Attachments

        Activity

          People

            backlog-server-cluster-scalability Backlog - Cluster Scalability
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: