[SERVER-81966] Avoid modification of previous ChunkMap instances during refresh Created: 08/Oct/23  Updated: 06/Dec/23  Resolved: 09/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.25, 7.0.1, 6.0.10, 5.0.21, 7.2.0-rc0, 7.1.0
Fix Version/s: 4.2.25, 7.1.1, 7.2.0-rc0, 5.0.22, 7.0.3, 4.4.26, 6.0.12

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: balancer-round-perf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-71627 Refreshed cached collection route inf... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0, v6.0, v5.0, v4.4, v4.2
Sprint: Sharding EMEA 2023-10-16
Participants:
Case:

 Description   

Bug description

During routing table refresh, we create an updated ChunkMap from an existing one (copy on write).  It is important that during the creation of the new ChunkMap the existing one remain untouched and valid.

The current update algorithm is affected by the bug that could cause a vector of the original ChunkMap to be erased.

This happens in ChunkMap::_mergeAndCommitUpdatedChunkVector where we std::move the chunkInfo pointers  from the old vector to the new one.
This old vector hasn't been copied so far, and thus it is shared with other ChunkMap instances. So in order to preserve its integrity, we should copy the pointers instead of moving them.

Conditions to trigger the bug

Several conditions need to apply in order to trigger this bug:

  • At least one merge chunk operation must have happened in-between on routing table refresh and the subsequent one.
  • The merge chunk operation need to happen on the last ChunkVector of the ChunkMap (a.k.a it need to be toward the end of the RoutingTable)
  • The merge operation need to reduce the size of the last ChunkVector to less than half of the configured max chunk vector size.

Additionally, in order for this bug to cause any harm, the original RoutingTable needs to be accessed after the refreshed one is constructed, that usually happen with long-lasting requests or with a very high frequency of quick requests.

Affected versions

  • [ 7.1.0-rc0, 7.2.0 ]
  • [ 7.0.1, 7.0.2]
  • [ 6.0.10, 6.0.11]
  • [ 5.0.21]
  • [ 4.4.25]

Remediations

Chunk merges are a prerequisite to hit this bug, thus the way to prevent triggering it is just to stop all chunk merges activities and restart all the binaries in the cluster (both mongod and mongos).

Version >= 7.0

  1. Disable auto-merger:
    Use the sh.disableAutoMerger() shell helper or update directly the "config.settings" collection:

    db.getSiblingDB("config").settings.update(
            {_id: 'automerge'},
            {$set: {enabled: false}},
            {upsert: true, writeConcern: {w: 'majority'}}
    );
    

  2. Stop defragmentations for all collections
  3. Stop performing manual chunk merges.
  4. Restart all binaries
    • All mongod and mongos processes

Version 6.0

  1. Stop defragmentations for all collections
  2. Stop performing manual chunk merges.
  3. Restart all binaries
    • All mongod and mongos processes

 Version <= 5.0

In these versions, the balancer does not perform any automatic chunk merges, thus the only users that can be affected and need to take the remediation steps are the ones that executed at least one manual chunk merge.

  1. Stop performing manual chunk merges.
  2. Restart all binaries
    • All mongod and mongos processes


 Comments   
Comment by Githook User [ 19/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v7.1
https://github.com/mongodb/mongo/commit/c0252c00305d86aa9d0f629d248d525aff7065d2

Comment by Githook User [ 11/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v4.2
https://github.com/mongodb/mongo/commit/b9574c90d91f48285b483b48e39cb34956084009

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v6.0
https://github.com/mongodb/mongo/commit/4852f602416878ff56e1bd49c8a82b21ab3d3b18

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v4.4
https://github.com/mongodb/mongo/commit/9e97139e2e321b1daf5f1b241077010c99c389a2

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v5.0
https://github.com/mongodb/mongo/commit/354323215d20b625d16aa7ef153d64f11d8028cb

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: v7.0
https://github.com/mongodb/mongo/commit/450e0d4116393a9d03c3d92f66a85148f403356a

Comment by Githook User [ 09/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-81966 Avoid modification of previous ChunkMap instances during refresh
Branch: master
https://github.com/mongodb/mongo/commit/f505e8241304bde75442b1b94023bb7c18c769d5

Generated at Thu Feb 08 06:47:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.