Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67301

Balancer may perform one unnecessary migration for a completely balanced collection

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.3, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Fully Compatible
    • v6.0
    • Sharding EMEA 2022-07-11, Sharding EMEA 2022-07-25, Sharding EMEA 2022-08-08
    • 35
    • 3

      Preamble

      As part of a range deletion:

      1. A batch of documents belonging to the orphaned range is deleted (each document in the batch is deleted in a single delete).
      2. The counter of orphans is updated accordingly, triggering also an update of the orphans counter on the BalancerStatsRegistry.

      Note that 1-2 are not happening in the same storage transaction, and - even if they would - the update of the in-memory stats registry would still not happen atomically.

      The balancer is considering the cluster balanced if the following condition is met:

      size(most loaded shard) <= size(least loaded shard) + 2 * chunkSize
      

       

      Problem

      If the range deletion happens on an already balanced collection while a range deletion is ongoing, it may happen that the balancer retrieves statistics from shards between bullets 1 and 2. This may trigger an unnecessary migration (no big deal as the chunk would be later migrated back, but still it's an unnecessary operation).

      Example

      Given a chunk size of 128MB for a given collection, if the collection size between between two shards A and B differs less than 256MB the balancer should not move any data.

      However, it could happen the following:

      • ShardA has 400MB, ShardB has 0MB
      • Move a range of 128MB from A to B
      • Move a range of 128MB from A to B
      • ShardB receives a bit of inserts, let's say 24MB

      At this point the collection is balanced because (280 - 144 = 136) is less than (2 * chunkSize = 256) :

      • ShardA still has 400MB: 144MB (actual data size) plus 256MB (orphans)
      • ShardB has 280MB (a bit more of 256MB, accounting also for some inserts)

      The range deleter kicks in:

      • Starts deleting the orphans for one range...
      • The range deleter has so far deleted 125MB
      • The balancer asks all the shards what's their data size:
        • Shard A will reply (400 [original size] - 125 [docs already deleted]) - (128 * 2 [tracked number of orphaned docs])
          So it replies 400 - 125 - 256 = 19MB
        • Shard B will reply 280MB
      • The balancer performs its calculation on the chunkSize: (280 - 19 > 2 * chunkSize) ? YES because (261 > 256)
      • A range is wrongly moved from shard B to shard A

            Assignee:
            silvia.surroca@mongodb.com Silvia Surroca
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: