[SERVER-44288] How can I guarantee near values ranges of chunks to be in the same shard? Created: 28/Oct/19  Updated: 27/Oct/23  Resolved: 14/Nov/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.13
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Nikolaos Koutroumanis Assignee: Kaloian Manassiev
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding 2019-11-04, Sharding 2019-11-18
Participants:

 Description   

Given the [example |https://docs.mongodb.com/manual/tutorial/view-sharded-cluster-configuration/]of range partitioning from the official site of MongoDB documentation, we see that the sharding key is based on the zip field. However, having a look at the cluster details, we notice that shard0001 contains two chunks of ranges [1,56000) and [57500,58140). Wouldn't we expect from the intermediate ranges [56000,56800),[56800,57088) and [57088,57500) to be stored also on the shard0001?

When hearing about range partitioning, I am expecting that every shard should contain continuous shard keys.  Is it possible to achieve that without defining zones?



 Comments   
Comment by Nikolaos Koutroumanis [ 14/Nov/19 ]

Sorry for the late responding,

Currently I don't have something specific in my mind. Now I understand why continuity is breaking. I'll think if there is any other way to keep continuity as much as possible. I'll be back.

Cheers,

Nikos

Comment by Kaloian Manassiev [ 29/Oct/19 ]

Hi nickkoutr,

In the presence of imbalance in the number of chunks between shards, the current balancer logic works by picking up the chunk, which sorts the earliest on the shard with the most number of chunks. So unfortunately there is on way to achieve continuity currently.

If you have a zone which is say from [0, 4) and it has 2 shards with the following chunks:
S1 - [0, 1), [1, 2), [2, 3)
S2 - [3, 4)

This policy will currently pick up [0, 1), which will break the continuity:
S1 - [1, 2), [2, 3)
S2 - [0, 1), [3, 4)

That being said we are now researching possible ways to improve the balancer policy and one of the things we are considering is making it try to keep continuity as much as possible. Do you have a particular use case in mind, which depends on the continuity of the chunks and if so would you mind describing it to me?

Best regards,
-Kal.

Generated at Thu Feb 08 05:05:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.