Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The check for monotonicity in the analyzeShardKey command currently relies on calculating the correlation coefficient between the WiredTiger RecordIds (i.e. y) in the index store and 1, ..., N (i.e. x) where N is the number of recordIds. Given that each RecordId is 8-byte, for a collection has 10 million unique shard key values, the check would involve storing ~2 * 80MB of x and y in memory. While the check should be fast, the memory usage can still have a non-negligible impact on the server. To this end, we should consider calculating the correlation coefficient in batches of size N' where N' is more manageable. This paper describes a way average correlation coefficients (r), specifically by transforming the r values using a Fisher's z transformation and then taking the average of the z values and converting it back to an r value.

is related to

SERVER-68753 Make analyzeShardKey command calculate metrics about the monotonicity

Closed

Assignee:: [DO NOT USE] Backlog - Cluster Scalability
Reporter:: Cheahuychou Mao
Participants:: [DO NOT USE] Backlog - Cluster Scalability, Adi Zaimi, Cheahuychou Mao
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Sep 13 2022 04:04:45 PM UTC
Updated:: Dec 12 2023 04:00:29 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates