Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: features we're not sure of
Affects Version/s: None
Component/s: Sharding
Labels:
- chunking
- etl
- sharding

Assigned Teams:

Sharding EMEA
Sprint:
Sharding 2016-12-12, Sharding 2017-01-02
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This is useful for sharding huge collections and the user would want to wait for the collection to be in steady state before inserting new documents.

Note: for empty collections, the chunks should already be balanced when the shardCollection command returns successfully.

Original description

We have had a few users run into problems with querying sharded collection that are being actively balanced.

With hashed shard keys, the "shardCollection" command will create chunks and distribute them across the shards. Normally, the migration of empty chunks takes little time. We have seen cases where the shards are so overloaded that this migration does not complete before the user application starts inserting documents. Therefore, the application is actively inserting documents into a collection that is being migrated. We have seen that the shards take days or longer to finally balance.

This situation can lead to strange problems like:

failures returned to the client when shard metadata is stale.

complete collection balancing never really being achieved

This situation is made worse by the tendency of some applications to keep creating a logical set of collections, either using a different name or creating new databases. We are not entirely sure why users want to partition a single logical data set into many collections (of the same structure) but this behavior is certainly not unusual.

Unfortunately, these users often delay optimizing or upgrading their cluster to reduce the load.

To assist these users, I suggest that we add methods, callable from client applications,

to test whether the balancing of a collection is complete (or as complete as it will get).

extend the "shardCollection" command adding a boolean argument ( e.g. "waitForBalancing") to block until the migration of the empty chunks has completed

With these, the clients can create new collections, wait for the balancing of the empty chunks, then proceed with inserting documents.

Assignee:: [DO NOT USE] Backlog - Sharding EMEA
Reporter:: Steven Hand (Inactive)
Participants:: [DO NOT USE] Backlog - Sharding EMEA, Asya Kamsky, Kaloian Manassiev, Steven Hand
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Oct 13 2016 03:30:08 PM UTC
Updated:: Dec 06 2022 04:13:57 AM UTC
Resolved:: Nov 15 2021 03:13:24 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates