-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
CAR Team 2025-11-24
-
1
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
In the addShard coordinator, we use listCollections to get the UUID for config.system.sessions and then issue dropCollection with that UUID. This is for replay protection because the drop collection command which supports OSI must be run against shard nodes and the shard identity has not yet been written on the shard being added.
The following split brain scenario can cause us to erroneously drop the sessions collection after the coordinator completes
- Primary is running the coordinator, gets stuck immediately before running listCollections
- Election happens, new primary is elected but old primary does not realize (split brain)
- New primary restarts this phase and completes the coordinator
- Sessions collection gets sharded in the sharded cluster putting some chunk on the shard which was just added
- Old primary which still thinks it is primary runs listCollection and then drops the new incarnation of the sessions collection
We should do a noop write after acquiring the UUID from the listCollections command in order to ensure that we were primary at the point where we got that UUID.
- is caused by
-
SERVER-102352 Add OSI support for AddShardCoordinator::_dropSessionsCollection
-
- Closed
-