-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Fully Compatible
-
v6.0
-
Sharding EMEA 2022-07-11, Sharding EMEA 2022-07-25, Sharding EMEA 2022-08-08
-
35
-
3
Preamble
As part of a range deletion:
- A batch of documents belonging to the orphaned range is deleted (each document in the batch is deleted in a single delete).
- The counter of orphans is updated accordingly, triggering also an update of the orphans counter on the BalancerStatsRegistry.
Note that 1-2 are not happening in the same storage transaction, and - even if they would - the update of the in-memory stats registry would still not happen atomically.
The balancer is considering the cluster balanced if the following condition is met:
size(most loaded shard) <= size(least loaded shard) + 2 * chunkSize
Problem
If the range deletion happens on an already balanced collection while a range deletion is ongoing, it may happen that the balancer retrieves statistics from shards between bullets 1 and 2. This may trigger an unnecessary migration (no big deal as the chunk would be later migrated back, but still it's an unnecessary operation).
Example
Given a chunk size of 128MB for a given collection, if the collection size between between two shards A and B differs less than 256MB the balancer should not move any data.
However, it could happen the following:
- ShardA has 400MB, ShardB has 0MB
- Move a range of 128MB from A to B
- Move a range of 128MB from A to B
- ShardB receives a bit of inserts, let's say 24MB
At this point the collection is balanced because (280 - 144 = 136) is less than (2 * chunkSize = 256) :
- ShardA still has 400MB: 144MB (actual data size) plus 256MB (orphans)
- ShardB has 280MB (a bit more of 256MB, accounting also for some inserts)
The range deleter kicks in:
- Starts deleting the orphans for one range...
- The range deleter has so far deleted 125MB
- The balancer asks all the shards what's their data size:
- Shard A will reply (400 [original size] - 125 [docs already deleted]) - (128 * 2 [tracked number of orphaned docs])
So it replies 400 - 125 - 256 = 19MB - Shard B will reply 280MB
- Shard A will reply (400 [original size] - 125 [docs already deleted]) - (128 * 2 [tracked number of orphaned docs])
- The balancer performs its calculation on the chunkSize: (280 - 19 > 2 * chunkSize) ? YES because (261 > 256)
- A range is wrongly moved from shard B to shard A
- duplicates
-
SERVER-67171 insert_with_data_size_aware_balancing.js fail: returning balancerComplaint as true when chunks are not fully balanced because orphanedCount is stale
- Closed
- is related to
-
SERVER-68777 BalancerCollectionStatus may report balancerCompliant too early
- Closed