[SERVER-26661] Add option for shardCollection to wait for initial chunks to be balanced Created: 13/Oct/16  Updated: 06/Dec/22  Resolved: 15/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: Improvement Priority: Major - P3
Reporter: Steven Hand Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Done Votes: 0
Labels: chunking, etl, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding EMEA
Sprint: Sharding 2016-12-12, Sharding 2017-01-02
Participants:

 Description   

This is useful for sharding huge collections and the user would want to wait for the collection to be in steady state before inserting new documents.

Note: for empty collections, the chunks should already be balanced when the shardCollection command returns successfully.

Original description

We have had a few users run into problems with querying sharded collection that are being actively balanced.

With hashed shard keys, the "shardCollection" command will create chunks and distribute them across the shards. Normally, the migration of empty chunks takes little time. We have seen cases where the shards are so overloaded that this migration does not complete before the user application starts inserting documents. Therefore, the application is actively inserting documents into a collection that is being migrated. We have seen that the shards take days or longer to finally balance.

This situation can lead to strange problems like:

  • failures returned to the client when shard metadata is stale.
  • complete collection balancing never really being achieved

This situation is made worse by the tendency of some applications to keep creating a logical set of collections, either using a different name or creating new databases. We are not entirely sure why users want to partition a single logical data set into many collections (of the same structure) but this behavior is certainly not unusual.

Unfortunately, these users often delay optimizing or upgrading their cluster to reduce the load.

To assist these users, I suggest that we add methods, callable from client applications,

  • to test whether the balancing of a collection is complete (or as complete as it will get).
  • extend the "shardCollection" command adding a boolean argument ( e.g. "waitForBalancing") to block until the migration of the empty chunks has completed

With these, the clients can create new collections, wait for the balancing of the empty chunks, then proceed with inserting documents.



 Comments   
Comment by Kaloian Manassiev [ 15/Nov/21 ]

Most of the requests in this ticket have been done under SERVER-43990.

Comment by Steven Hand [ 21/Oct/16 ]

Of course they could query the config database collections directly. Regardless, of how the data is collected, without the proposed change, the customer would have to calculate when the collection is balanced in order to avoid the adverse side-effects in this scenario.

Comment by Asya Kamsky [ 13/Oct/16 ]

are they doing "shardCollection" and giving it some huge number of chunks? If so why? They shouldn't be.

Generated at Thu Feb 08 04:12:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.