Provide a way for users to know the data size per shard for a particular shard key using analyzeShardKey

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Cluster Scalability
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      If the users want to know how much data will live on each shard after they shard a collection, there is no way. A workaround is to run an aggregation with $bucketAuto which will give information about the number of docs per shard for a particular shard key which can then help calculate the data size. In addition to that, users have to run $facet to understand the number of distinct values of the shard key per range in $bucketAuto.
      We should add this information to analyzeShardKey() because we already scan the index to generate split points. 

      Atlas [mongos] testdata> db.bigCollection.aggregate([
      ...  { 
      ...   {$sort: {card_number:1}} 
      ...   $bucketAuto: {
      ...    groupBy: "$card_number",     // shard key
      ...    buckets: 3,      // since you have 3 shards
      ...    output: {
      ...     count: { $sum: 1 }  // count docs per bucket
      ...    }
      ...   }
      ...  }
      ... ])
      
      [
       {
        _id: { min: '0000010055728383', max: '3333511084616141' },
        count: 333333333
       },
       {
        _id: { min: '3333511084616141', max: '6666504640041919' },
        count: 333333333
       },
       {
        _id: { min: '6666504640041919', max: '9999999995035362' },
        count: 333333334
       }
      ] 
      
       db.bigCollection.aggregate([
        {$sort: {card_number:1}},
        {$facet: {
          shard1: [{$match: {card_number:{$gte:'0000010055728383',$lt:'3333511084616141' }}},{$group: {_id: "$card_number"}},{$count: "count"}],
          shard2: [{$match: {card_number:{$gte:'3333511084616141',$lt:'6666504640041919' }}},{$group: {_id: "$card_number"}},{$count: "count"}],
          shard3: [{$match: {card_number:{$gte:'6666504640041919',$lt:'9999999995035362' }}},{$group: {_id: "$card_number"}},{$count: "count"}]
        }}
      ])
      
      [
        {
          shard1: [ { count: 333333306 } ],
          shard2: [ { count: 333333320 } ],
          shard3: [ { count: 333333317 } ]
        }
      ]

              Assignee:
              Unassigned
              Reporter:
              Ratika Gandhi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: