Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-81966

Avoid modification of previous ChunkMap instances during refresh

    • Fully Compatible
    • ALL
    • v7.1, v7.0, v6.0, v5.0, v4.4, v4.2
    • Sharding EMEA 2023-10-16

      Bug description

      During routing table refresh, we create an updated ChunkMap from an existing one (copy on write).  It is important that during the creation of the new ChunkMap the existing one remain untouched and valid.

      The current update algorithm is affected by the bug that could cause a vector of the original ChunkMap to be erased.

      This happens in ChunkMap::_mergeAndCommitUpdatedChunkVector where we std::move the chunkInfo pointers  from the old vector to the new one.
      This old vector hasn't been copied so far, and thus it is shared with other ChunkMap instances. So in order to preserve its integrity, we should copy the pointers instead of moving them.

      Conditions to trigger the bug

      Several conditions need to apply in order to trigger this bug:

      • At least one merge chunk operation must have happened in-between on routing table refresh and the subsequent one.
      • The merge chunk operation need to happen on the last ChunkVector of the ChunkMap (a.k.a it need to be toward the end of the RoutingTable)
      • The merge operation need to reduce the size of the last ChunkVector to less than half of the configured max chunk vector size.

      Additionally, in order for this bug to cause any harm, the original RoutingTable needs to be accessed after the refreshed one is constructed, that usually happen with long-lasting requests or with a very high frequency of quick requests.

      Affected versions

      • [ 7.1.0-rc0, 7.2.0 ]
      • [ 7.0.1, 7.0.2]
      • [ 6.0.10, 6.0.11]
      • [ 5.0.21]
      • [ 4.4.25]

      Remediations

      Chunk merges are a prerequisite to hit this bug, thus the way to prevent triggering it is just to stop all chunk merges activities and restart all the binaries in the cluster (both mongod and mongos).

      Version >= 7.0

      1. Disable auto-merger:
        Use the sh.disableAutoMerger() shell helper or update directly the "config.settings" collection:
        db.getSiblingDB("config").settings.update(
                {_id: 'automerge'},
                {$set: {enabled: false}},
                {upsert: true, writeConcern: {w: 'majority'}}
        );
        
      2. Stop defragmentations for all collections
      3. Stop performing manual chunk merges.
      4. Restart all binaries
        • All mongod and mongos processes

      Version 6.0

      1. Stop defragmentations for all collections
      2. Stop performing manual chunk merges.
      3. Restart all binaries
        • All mongod and mongos processes

       Version <= 5.0

      In these versions, the balancer does not perform any automatic chunk merges, thus the only users that can be affected and need to take the remediation steps are the ones that executed at least one manual chunk merge.

      1. Stop performing manual chunk merges.
      2. Restart all binaries
        • All mongod and mongos processes

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: