[SERVER-69646] [Optimization] Consider making analyzeShardKey command calculate correlation coefficient in batches Created: 13/Sep/22  Updated: 12/Dec/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-68753 Make analyzeShardKey command calculat... Closed
Assigned Teams:
Cluster Scalability
Participants:

 Description   

The check for monotonicity in the analyzeShardKey command currently relies on calculating the correlation coefficient between the WiredTiger RecordIds (i.e. y) in the index store and  1, ..., N (i.e. x) where N is the number of recordIds. Given that each RecordId is 8-byte, for a collection has 10 million unique shard key values, the check would involve storing ~2 * 80MB of x and y in memory. While the check should be fast, the memory usage can still have a non-negligible impact on the server. To this end, we should consider calculating the correlation coefficient in batches of size N' where N' is more manageable. This paper describes a way average correlation coefficients (r), specifically by transforming the r values using a Fisher's z transformation and then taking the average of the z values and converting it back to an r value.



 Comments   
Comment by Adi Zaimi [ 06/Jun/23 ]

I have lost the original reference I found, but I could find [META-ANALYSIS OF CORRELATION
COEFFICIENTS: A MONTE CARLO COMPARISON
OF FIXED- AND RANDOM-EFFECTS METHODS |https://core.ac.uk/download/pdf/2708611.pdf.]

and [A note on combining correlations|https://link.springer.com/content/pdf/10.3758/BF03334158.pdf|https://link.springer.com/content/pdf/10.3758/BF03334158.pdf].=] which describe a few ways this can be accomplished.

Note that the Z-transform method mentioned in the description probably refers to the Silver & Dunlap  (1987) paper.

 

Generated at Thu Feb 08 06:14:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.