[DOCS-718] Documentation on routing of queries in the presence of only part of a compound shard key is incorrect Created: 05/Nov/12 Updated: 08/Nov/12 Resolved: 08/Nov/12 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Zola | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys |
||
| Issue Links: |
|
||||
| Participants: | |||||
| Days since reply: | 11 years, 14 weeks, 6 days ago | ||||
| Description |
|
In http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys, the information about routing in the presence of only part of a shard key is incorrect. |
| Comments |
| Comment by auto [ 08/Nov/12 ] |
|
Author: {u'date': u'2012-11-08T15:00:34Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}Message: |
| Comment by auto [ 07/Nov/12 ] |
|
Author: {u'date': u'2012-11-07T19:05:35Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}Message: |
| Comment by auto [ 06/Nov/12 ] |
|
Author: {u'date': u'2012-11-06T22:10:37Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}Message: |
| Comment by William Zola [ 05/Nov/12 ] |
|
Here is the correct information: 1) MongoDB uses the entire shard key to create an ordering of all the documents in the sharded collection. The chunks are split based on that ordering. Consider a compound shard key made up of four fields: {field1:1, field2:1, field3:1, field4:1}. The fields of the shard key are used to create a full ordering of all the documents in the collection. One way of thinking about this is that this shard key defines a four-digit number, with 'field1' being the most significant digit, and 'field4' being the least significant. If the fields that you in a query provide are sufficiently selective, then 'mongos' will be able to route the query, at least partially. If they are not sufficiently selective, then 'mongos' will do a scatter/gather query. This means that the only way to guarantee that a query will be a routed query if it contains all of the fields that make up the shard key. 2) MongoDB will make a "best effort" attempt to limit the number of shards that a query will be routed to. The success of this effort depends on the specific data distribution and chunk distribution of your particular data set. This is easier to see with an example. Let's say that you're using the shard key from above: {field1:1, field2:1, field3:1, field4:1}. Assume a cluster containing 4 different shards. Consider what happens when you do a query of the form db.coll.find( {field1:"a"}); There are two different possibilities to consider. One is that all chunks with a shard key that contains {field1:"a"}exist on a single shard; the other is that the range which contains {field1:"a"}has been split (possibly on different values of 'field4'), and the documents which contain {field1:"a"}live on multiple shards. As you can see, it's easy for documents which satisfy this query to live on all 4 shards in the cluster. Given this situation, MongoDB will make a "best effort" attempt to limit the number of shards that have to run this query. It will do this by examining the chunk ranges for the fields that have been specified in the query, to see if it can limit the number of shards to which it will route that query. For example, based on the query {field1:"a"}, the 'mongos' process might determine that all of the documents might live on one shard; two shards; three shards; or all shards. Based on this determination, the 'mongos' process will route the query appropriately. 3) Since the chunks form a strict ordering, the key ordering is significant. In particular:
|