[DOCS-7880] Comment on: "manual/core/sharding-introduction.txt" Created: 16/May/16  Updated: 03/Nov/17  Resolved: 20/May/16

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: 01112017-cleanup

Type: Bug Priority: Major - P3
Reporter: Docs Collector User (Inactive) Assignee: Ravind Kumar (Inactive)
Resolution: Done Votes: 0
Labels: collector-298ba4e7
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongodb 3.2

Location: https://docs.mongodb.com/manual/core/sharding-introduction/
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0
Referrer: https://docs.mongodb.com/?_ga=1.255109824.958390077.1461633920
Screen Resolution: 1366 x 768


Participants:
Days since reply: 7 years, 39 weeks, 2 days ago

 Description   

If the mongoDB has the balancer that can migrating the chunks in shards to maintaing the data balance.why you say "range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard."



 Comments   
Comment by Ravind Kumar (Inactive) [ 16/May/16 ]

Hello Huang,

We're working on revising these pages to make distinctions like this clearer. Until then, I hope the following explanation helps:

With ranged-based partitioning, each chunk represents a range of shard key values with an inclusive lower and exclusive upper bound.

For example:

    chunkA : { lowerBound : 1, upperBound : 10 },
    chunkB : { lowerBound: 10, upperBound: 20 }

With this, the balancer would route new documents with a shard key value of 1 - 9 to chunk A, and 10 - 19 to chunk B.

The reason for the warning is due to two special chunk ranges - minKey and maxKey. In order to ensure data is always stored on at least one chunk, there is a chunk that is given a lower bound of minKey, and a chunk given an upper bound of maxKey.

minKey always compares as less than any other possible value. maxKey always compares as greater than any possible value.

So, using the previous example:

    chunkA : { lowerBound : minKey, upperBound : 10 },
    chunkB : { lowerBound: 10, upperBound: maxKey }

Now, if I have a key with value -1, -100, or -10000, that document would route to chunkA. If I have a key with 11, 110, and 11000, that document would go to chunkB.

With monotonically increasing shard keys - or any shard key where the value is always increasing at a fixed rate - all new documents are routed to the chunk with maxKey as the upper bound. If there is a chunk split, one of the new chunks now contain the maxKey upper limit.

So if I have a shard key that increases, such as a timestamp, all of my new data always ends up in a single chunk. The shard containing that chunk now receives all write operations, creating a bottleneck.

It goes both ways - if I have a shard key that is always decreasing at a fixed rate, those documents are routed to the chunk with minKey as the lower bound, and the same bottleneck occurs.

I hope this answers your question - please keep an eye on our documentation in the future, as we are planning updates to clarify many points on sharding.

Generated at Thu Feb 08 07:55:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.