Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cs-impact-2

Assigned Teams:

Cluster Scalability
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When running analyzeShardKey with keyCharacteristics: true, both the monotonicity and cardinality/frequency passes perform a full sequential scan of the supporting index regardless of sampleSize. The sampling parameters only control how many results are kept, not how many index keys are read.For large collections (1B+ documents) with a small sampleSize (e.g. 1M), execution time is dominated by scanning ~1B index keys twice, even though only 1M values are ultimately used.

Both calculateMonotonicity and the cardinality/frequency aggregation scan nearly the entire supporting index even when sampleSize << collectionSize. For sampleSize=1M on a 1B collection, both passes iterate ~1B index keys to probabilistically collect 1M samples (rate = 1M/1B = 0.001).

Monotonicity: exec->getNext() advances the cursor on every iteration, even when shouldSample is false https://github.com/10gen/mongo/blob/master/src/mongo/db/s/analyze_shard_key_cmd_util.cpp#L804-L806

Cardinality: $sampleRate filter evaluates every document in the pipeline https://github.com/10gen/mongo/blob/master/src/mongo/db/s/analyze_shard_key_cmd_util.cpp#L222-L225

We need this improvement for sharding key advisor epic https://jira.mongodb.org/browse/CLOUDP-376926 since it will likely not perform for very large customer collections - where the final solution behavior makes analyzeShardKey runtime proportional to sampleSize parameter not collection size.

duplicates

SERVER-120246 Optimize calculation of cardinality, frequency and monotonicity metrics in analyzeShardKey command

Backlog

is related to

SERVER-120246 Optimize calculation of cardinality, frequency and monotonicity metrics in analyzeShardKey command

Backlog

Assignee:: Unassigned
Reporter:: Alex Dambrouski
Participants:: Alex Dambrouski
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Feb 18 2026 10:19:37 PM UTC
Updated:: Mar 09 2026 09:29:41 PM UTC
Resolved:: Mar 09 2026 09:29:16 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates