Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
CAR Team 2025-11-24
Story Points:
1
CAR Domain/s:

🟩 Routing and Topology

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the addShard coordinator, we use listCollections to get the UUID for config.system.sessions and then issue dropCollection with that UUID. This is for replay protection because the drop collection command which supports OSI must be run against shard nodes and the shard identity has not yet been written on the shard being added.

The following split brain scenario can cause us to erroneously drop the sessions collection after the coordinator completes

Primary is running the coordinator, gets stuck immediately before running listCollections
Election happens, new primary is elected but old primary does not realize (split brain)
New primary restarts this phase and completes the coordinator
Sessions collection gets sharded in the sharded cluster putting some chunk on the shard which was just added
Old primary which still thinks it is primary runs listCollection and then drops the new incarnation of the sessions collection

We should do a noop write after acquiring the UUID from the listCollections command in order to ensure that we were primary at the point where we got that UUID.

is caused by

SERVER-102352 Add OSI support for AddShardCoordinator::_dropSessionsCollection

Closed

Assignee:: Wolfee Farkas
Reporter:: Allison Easton
Participants:: Allison Easton, Githook User, Wolfee Farkas
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 28 2025 02:58:48 PM UTC
Updated:: Nov 12 2025 05:12:39 AM UTC
Resolved:: Nov 11 2025 03:34:25 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates