Details
-
Task
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
None
-
None
-
Cluster Scalability
Description
The check for monotonicity in the analyzeShardKey command currently relies on calculating the correlation coefficient between the WiredTiger RecordIds (i.e. y) in the index store and 1, ..., N (i.e. x) where N is the number of recordIds. Given that each RecordId is 8-byte, for a collection has 10 million unique shard key values, the check would involve storing ~2 * 80MB of x and y in memory. While the check should be fast, the memory usage can still have a non-negligible impact on the server. To this end, we should consider calculating the correlation coefficient in batches of size N' where N' is more manageable. This paper describes a way average correlation coefficients (r), specifically by transforming the r values using a Fisher's z transformation and then taking the average of the z values and converting it back to an r value.
Attachments
Issue Links
- is related to
-
SERVER-68753 Make analyzeShardKey command calculate metrics about the monotonicity
-
- Closed
-