[DOCS-7145] Clarify the manual on sharding existing collection size limit. Created: 12/Feb/16  Updated: 30/Oct/23  Resolved: 09/Jun/16

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Task Priority: Major - P3
Reporter: Wan Bachtiar Assignee: Ravind Kumar (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by DOCS-7254 Comment on: "manual/reference/limits.... Closed
Participants:
Days since reply: 7 years, 35 weeks, 1 day ago

 Description   

*edit* added note: 8,192 split point limit does not apply to initial sharding of collection (added note here so the description of the ticket does not create mistaken impression that it does).

It would be good to revisit and clarify the manual on :

https://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size

We should improve the clarification on how the two sizes (256GB and 400GB) were calculated/estimated.

Also It would be great to revisit the table:

  • Clarifying how the number of splits are calculated.
  • Removing the 1MB chunk max collection size. As in the context of sharding an existing collection, generally it would be beneficial to increase the chunk size rather than reducing it.

NOTE: For 3.4, need to revisit this as there are limits for empty collections with hashed shard keys.



 Comments   
Comment by Randolph Tan [ 14/Jun/16 ]

ravind.kumar The numInitialChunks hard limit is only for v3.4. For the rest, they should be the same for old versions of mongo.

Comment by Ravind Kumar (Inactive) [ 13/Jun/16 ]

renctan, Is there anything here that would not apply to 3.0, or possibly 2.6? This might be worth backporting.

Comment by Githook User [ 13/Jun/16 ]

Author:

{u'username': u'rkumar-mongo', u'name': u'ravind', u'email': u'ravind.kumar@10gen.com'}

Message: DOCS-7145: limits for sharding existing data

Signed-off-by: kay <kay.kim@10gen.com>
Branch: master
https://github.com/mongodb/docs/commit/7b6fda5517da0a730c5b5e917038e2038f05d109

Comment by Ravind Kumar (Inactive) [ 09/Jun/16 ]

https://github.com/mongodb/docs/pull/2643

Comment by Randolph Tan [ 06/Jun/16 ]

Randolph Tan, apologies for leaving out the context. I was referring to comments on DOCS-7145:comments and HELP-1859:comment .
Thanks for clearing that up, I appreciate it.

In the first comment, I believe Asya was referring to sharding a collection with existing data. In this scenario, mongos will create new chunks for the collection (as if splitting min->max to several chunks). The second one refers to the user calling the split command explicitly. Note that there is a special case where the 8192 limit applies and this was demonstrated in SERVER-22430: sharding an empty collection with a hashed key and specifying with numInitialChunks.

Also in the second calculation, this line: splitPointsRequired = <average document size> / <shard key size> doesn't quite make sense to me. Generally speaking, a document would contain one shard key.

If my understanding of the formula is correct, both "size" refers to the BSON Object size. If that's the case, I also don't follow how this formula came about.

Which means for v3.2 docs, it's only a minimum value between 1,000,000 and max BSON size/shard key size.

Note: The 1000000 limit was added together with the 8192*nShards limit. In other words, this check did not exist in v3.2.

Comment by Ravind Kumar (Inactive) [ 02/Jun/16 ]

wan.bachtiar, my understanding is that the first calculation is using the maximum BSON document size of 16MB. While this is good for estimating maximum collection size based on shard key size and chunk size, I imagine most customers do not approach that limit very often. So the second formula would just be the average document size of the target collection instead of the maximum BSON document size.

For example, a 64 bit shard key with a 64 MB chunk size would allow for up to 8TB of size, requiring 16 shards to support every split point (3.4+). But if the customer has an average document size of only 4MB, the number of split points would be 4x lower, as would be the number of shards. I wouldn't want a customer to view the formula / table and end up with a much larger number of shards than they need for their collection. Maybe I'm over-estimating the issue here.

Comment by Ravind Kumar (Inactive) [ 25/May/16 ]

I've updated the code review based on some of the discussions here. Please review when you get a chance. wan.bachtiar, I folded in the number of shards as a measure of the minimum number needed to support a given number of split points.

Comment by Asya Kamsky [ 31/Mar/16 ]

does the maximum split limit also effect existing sharded clusters that are growing significantly, or just existing collections that need to be sharded?

All of this discussion is ONLY applicable to enabling sharding on an existing non-sharded collection. None of the discussion applies to already sharded collections.

Comment by Asya Kamsky [ 22/Mar/16 ]

please note that this ticket and DOCS-7254 are marked as duplicates but both are still open.

Anyway, 8192 is a non-issue for initial sharding of collection. It's only a limit for manual running of split command.

Generated at Thu Feb 08 07:53:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.