[DOCS-718] Documentation on routing of queries in the presence of only part of a compound shard key is incorrect Created: 05/Nov/12  Updated: 08/Nov/12  Resolved: 08/Nov/12

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: William Zola Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys


Issue Links:
Depends
Participants:
Days since reply: 11 years, 14 weeks, 6 days ago

 Description   

In http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys, the information about routing in the presence of only part of a shard key is incorrect.



 Comments   
Comment by auto [ 08/Nov/12 ]

Author:

{u'date': u'2012-11-08T15:00:34Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}

Message: DOCS-718 tweaking sharding routing text
Branch: master
https://github.com/mongodb/docs/commit/bc406a4b809bc92687a2d44da1ac6e397ab97725

Comment by auto [ 07/Nov/12 ]

Author:

{u'date': u'2012-11-07T19:05:35Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}

Message: DOCS-718 additional clarification
Branch: master
https://github.com/mongodb/docs/commit/3374b3960a51472c29368b1fa5b17d6c6fe72ee3

Comment by auto [ 06/Nov/12 ]

Author:

{u'date': u'2012-11-06T22:10:37Z', u'email': u'samk@10gen.com', u'name': u'Sam Kleinman'}

Message: DOCS-718 corrections and improvements to sharded query routing cross references
Branch: master
https://github.com/mongodb/docs/commit/985bdcfe82d2a95435ebc59036f1de6bb2ced546

Comment by William Zola [ 05/Nov/12 ]

Here is the correct information:

1) MongoDB uses the entire shard key to create an ordering of all the documents in the sharded collection. The chunks are split based on that ordering.

Consider a compound shard key made up of four fields:

{field1:1, field2:1, field3:1, field4:1}

.

The fields of the shard key are used to create a full ordering of all the documents in the collection. One way of thinking about this is that this shard key defines a four-digit number, with 'field1' being the most significant digit, and 'field4' being the least significant.

If the fields that you in a query provide are sufficiently selective, then 'mongos' will be able to route the query, at least partially. If they are not sufficiently selective, then 'mongos' will do a scatter/gather query.

This means that the only way to guarantee that a query will be a routed query if it contains all of the fields that make up the shard key.

2) MongoDB will make a "best effort" attempt to limit the number of shards that a query will be routed to. The success of this effort depends on the specific data distribution and chunk distribution of your particular data set.

This is easier to see with an example.

Let's say that you're using the shard key from above:

{field1:1, field2:1, field3:1, field4:1}

. Assume a cluster containing 4 different shards.

Consider what happens when you do a query of the form db.coll.find(

{field1:"a"}

); There are two different possibilities to consider.

One is that all chunks with a shard key that contains

{field1:"a"}

exist on a single shard; the other is that the range which contains

{field1:"a"}

has been split (possibly on different values of 'field4'), and the documents which contain

{field1:"a"}

live on multiple shards. As you can see, it's easy for documents which satisfy this query to live on all 4 shards in the cluster.

Given this situation, MongoDB will make a "best effort" attempt to limit the number of shards that have to run this query. It will do this by examining the chunk ranges for the fields that have been specified in the query, to see if it can limit the number of shards to which it will route that query.

For example, based on the query

{field1:"a"}

, the 'mongos' process might determine that all of the documents might live on one shard; two shards; three shards; or all shards.

Based on this determination, the 'mongos' process will route the query appropriately.

3) Since the chunks form a strict ordering, the key ordering is significant. In particular:

  • No routing can be done if the most significant portion of the key ('field1' in this example) is not present
  • The greater number of shard key elements that are present (going from left to right), the higher the likelihood that the query can be routed to a subset of the shards. (For example, having both 'field1' and 'field2' present in the query will increase the chances that the query can be routed to a subset of the shards; adding 'field3' will increase the chances still further.)
  • Whether or not a query can be routed to a subset of shards is dependent on the data distribution, where the chunk splits are, and which chunks live on which shards
  • Missing fields in the left-to-right order mean that the fields to the right cannot be used for routing. For example, if the query includes 'field1' and 'field4', then 'field4' cannot be used for routing; if the query includes 'field1', 'field2' and 'field4', then 'field1' and 'field2' will be used for routing.
Generated at Thu Feb 08 07:39:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.