-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.2.25, 7.0.1, 6.0.10, 5.0.21, 7.2.0-rc0, 7.1.0
-
Component/s: None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
v7.1, v7.0, v6.0, v5.0, v4.4, v4.2
-
Sharding EMEA 2023-10-16
-
(copied to CRM)
Bug description
During routing table refresh, we create an updated ChunkMap from an existing one (copy on write). It is important that during the creation of the new ChunkMap the existing one remain untouched and valid.
The current update algorithm is affected by the bug that could cause a vector of the original ChunkMap to be erased.
This happens in ChunkMap::_mergeAndCommitUpdatedChunkVector where we std::move the chunkInfo pointers from the old vector to the new one.
This old vector hasn't been copied so far, and thus it is shared with other ChunkMap instances. So in order to preserve its integrity, we should copy the pointers instead of moving them.
Conditions to trigger the bug
Several conditions need to apply in order to trigger this bug:
- At least one merge chunk operation must have happened in-between on routing table refresh and the subsequent one.
- The merge chunk operation need to happen on the last ChunkVector of the ChunkMap (a.k.a it need to be toward the end of the RoutingTable)
- The merge operation need to reduce the size of the last ChunkVector to less than half of the configured max chunk vector size.
Additionally, in order for this bug to cause any harm, the original RoutingTable needs to be accessed after the refreshed one is constructed, that usually happen with long-lasting requests or with a very high frequency of quick requests.
Affected versions
- [ 7.1.0-rc0, 7.2.0 ]
- [ 7.0.1, 7.0.2]
- [ 6.0.10, 6.0.11]
- [ 5.0.21]
- [ 4.4.25]
Remediations
Chunk merges are a prerequisite to hit this bug, thus the way to prevent triggering it is just to stop all chunk merges activities and restart all the binaries in the cluster (both mongod and mongos).
Version >= 7.0
- Disable auto-merger:
Use the sh.disableAutoMerger() shell helper or update directly the "config.settings" collection:db.getSiblingDB("config").settings.update( {_id: 'automerge'}, {$set: {enabled: false}}, {upsert: true, writeConcern: {w: 'majority'}} );
- Stop defragmentations for all collections
- Stop performing manual chunk merges.
- Restart all binaries
- All mongod and mongos processes
Version 6.0
- Stop defragmentations for all collections
- Stop performing manual chunk merges.
- Restart all binaries
- All mongod and mongos processes
Version <= 5.0
In these versions, the balancer does not perform any automatic chunk merges, thus the only users that can be affected and need to take the remediation steps are the ones that executed at least one manual chunk merge.
- Stop performing manual chunk merges.
- Restart all binaries
- All mongod and mongos processes
- is caused by
-
SERVER-71627 Refreshed cached collection route info will severely block all client request when a cluster with 1 million chunks
- Closed