Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- analyzeshardkey-feedback

Assigned Teams:

Cluster Scalability
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If the users want to know how much data will live on each shard after they shard a collection, there is no way. A workaround is to run an aggregation with $bucketAuto which will give information about the number of docs per shard for a particular shard key which can then help calculate the data size. In addition to that, users have to run $facet to understand the number of distinct values of the shard key per range in $bucketAuto.
We should add this information to analyzeShardKey() because we already scan the index to generate split points.

Atlas [mongos] testdata> db.bigCollection.aggregate([
...  { 
...   {$sort: {card_number:1}} 
...   $bucketAuto: {
...    groupBy: "$card_number",     // shard key
...    buckets: 3,      // since you have 3 shards
...    output: {
...     count: { $sum: 1 }  // count docs per bucket
...    }
...   }
...  }
... ])

[
 {
  _id: { min: '0000010055728383', max: '3333511084616141' },
  count: 333333333
 },
 {
  _id: { min: '3333511084616141', max: '6666504640041919' },
  count: 333333333
 },
 {
  _id: { min: '6666504640041919', max: '9999999995035362' },
  count: 333333334
 }
] 

 db.bigCollection.aggregate([
  {$sort: {card_number:1}},
  {$facet: {
    shard1: [{$match: {card_number:{$gte:'0000010055728383',$lt:'3333511084616141' }}},{$group: {_id: "$card_number"}},{$count: "count"}],
    shard2: [{$match: {card_number:{$gte:'3333511084616141',$lt:'6666504640041919' }}},{$group: {_id: "$card_number"}},{$count: "count"}],
    shard3: [{$match: {card_number:{$gte:'6666504640041919',$lt:'9999999995035362' }}},{$group: {_id: "$card_number"}},{$count: "count"}]
  }}
])

[
  {
    shard1: [ { count: 333333306 } ],
    shard2: [ { count: 333333320 } ],
    shard3: [ { count: 333333317 } ]
  }
]

Assignee:: Unassigned
Reporter:: Ratika Gandhi
Participants:: Ratika Gandhi
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jul 07 2025 03:53:35 PM UTC
Updated:: Aug 08 2025 05:36:03 PM UTC

Details

Description

Attachments

Activity

People

Dates