Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78115

Shard primaries must commit a majority write before using new routing information from the config server

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.3, 6.0.12, 5.0.23
    • Affects Version/s: 3.6.23, 4.0.28, 4.2.24, 7.1.0-rc0, 6.0.6, 4.4.22, 5.0.18, 7.0.0-rc3
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v7.0, v6.0, v5.0
    • Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04
    • 152

      In SERVER-30797, a majority write was added to the refresh path on primaries after fetching new routing information from the config server. This write ensured that the node which fetched the routing information was actually the majority primary, preventing incorrect filtering information from being applied in split brain scenarios.

      This write was removed in SERVER-35092 since it was believed to be unnecessary and was causing stalls when a refresh happened without a majority of nodes available.

      However, the split brain scenario for which the majority write was added is still a problem, and since the removal of that write, it is possible to hit this again. The scenario is as follows

      • Suppose we have a 2 shard cluster with 3 nodes per shard where (min, 0) is on shard 0 and (0, max) is on shard 1 with one document in each chunk
      • Now a network partition separates the primary of shard 0 from the secondaries and one of those secondaries steps up (creating a split brain scenario)
      • Chunk (0, max) is moved back to shard 0
      • A mongoS that hasn't learned about the new primary on shard 0 routes a majority read to the old primary
      • The old primary (who still believes itself to be primary) fetches the new routing information from the config

      In this case, the old primary will respond to the majority read using the newest filtering information but without ever having seen the chunk migration.

      This can also affect secondaries who refresh via the node that believes itself to be primary, causing their filtering information to be ahead of the data they have.

      The solution here is to add back in the majority noop write to the SSCCL. It will ensure that if new filtering information is found, it can only be used and sent to secondaries by the actual primary of the replica set.

            allison.easton@mongodb.com Allison Easton
            allison.easton@mongodb.com Allison Easton
            0 Vote for this issue
            9 Start watching this issue