Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78052

Properly handle conflict between balancer splitting due to zoning and auto-merger

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc7
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Sharding EMEA
    • Fully Compatible
    • ALL
    • v7.0
    • Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10

      The auto-merger currently works on a secondary thread executed concurrently with the balancer thread and its behavior can be summarized as follows:

      • (1) while the balancer is enabled:
        • (2) while there are <collection, shard> with mergeable chunks (mergeability requirements documented in DOCS-15976)
          • (3) for each <collection, shard> discovered by (2):
            • (4) squash together mergeable chunks
            • (5) sleep for 15 seconds
        • (6) sleep for 1 hour

      As part of a balancing round, the balancer is taking care of splitting chunks according to the configured zones so that they can then be moved off. Since splitting is an operation that does not imply ownership change, 2 or more split chunks are always mergeable as long as they reside on the same shard at least for the history window (defined in DOCS-15976).

      The conflict between the balancer splitting chunks for zoning and the auto-merger squashing together mergeable chunks had been considered acceptable based on the following ideas:

      • The auto-merger may merge chunks belonging to different zones that are currently residing on the same shard
      • But anyway the auto-merger will then "go to sleep" for 1 hour
      • This leaves enough time for the balancer to split again and keep on moving data (avoiding future merges)

      It turns out that - given the extreme slowness of splits in case of several hundred of zones - there is a perfect interleaving leading to the following continuous conflict between the balancer and the auto-merger:

      • (A) The balancer starts splitting chunks
      • (B) The auto-merger discovers mergeable chunks due to (2)
      • (C) Due to (4), the auto-merger squashes together chunks that were just split because of (A)
      • (D) The auto-merger sleeps 15 seconds due to (5) while (A) is still running and discovers new chunks due to (2)
      • (E) The balancer finishes (A) but part of the split chunks have been merged back
      • Back to A, repeat

            Assignee:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: