Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- analyzeshardkey-feedback

Assigned Teams:

Cluster Scalability
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Consuming information about numReadsByRanges and numWritesByRanges is only valuable when a user can double click into which ranges are read or write heavy. Without providing such a functionality, the reads/writes by ranges output often raises more questions than answers. It can also block the output of percentage reads and writes that analyzeShardKey with readWriteDistribution: true provides due to $sample issues.

when analyzeShardKeyNumRanges is 1, we just skip doing getNext() on $sample aggregation. So, we should default it to 1.

If users would like to see the output, they can use the setParameter to set it to 100.

More context for posterity:

When calculating the keyCharacteristics metrics, the sampled documents are used to calculate the cardinality, frequency and monotonicity. For that, we do want higher precision, which is why the calculation requires sampling large number of documents (sampleRate defaults to 1 and sampleSize defaults to 1 million) and we purposely use an index scan instead of $sample even though performance testing shows the cost of a index scan is non-trivial for large collections.
When calculating the readWriteDistribution metrics, the sampled documents are only used the define the chunk boundaries so rough estimate is acceptable. The use of $sample is more tied to how it is the approach used by resharding to find chunk boundaries. Also, we expected $sample to return representative random samples when the shard key has good cardinality and frequency. We later discovered that the assumption didn't always hold due to the randomness issue in $sample

is related to

WT-8003 Fix frequent duplicate keys returned by random cursor in resharding test

Closed

Assignee:: Unassigned
Reporter:: Ratika Gandhi
Participants:: Ratika Gandhi
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jul 18 2025 08:05:27 PM UTC
Updated:: Aug 11 2025 07:33:04 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates