[SERVER-80249] Investigate bootstrapping sharding components from an existing replica set Created: 18/Aug/23  Updated: 29/Oct/23  Resolved: 12/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: Wenqin Ye
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-82024 [For v8.1] Remove code that skips che... Open
Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Sprint: Sharding NYC 2023-10-02, Sharding NYC 2023-10-16
Participants:

 Description   

In particular figure out if it will hit this assertion and decide how to handle it.



 Comments   
Comment by Githook User [ 12/Oct/23 ]

Author:

{'name': 'Wenqin Ye', 'email': 'wenqin908@gmail.com', 'username': 'wenqinYe'}

Message: SERVER-80249: Allow existing replica set to restart as auto-bootstrapped config shard
Branch: master
https://github.com/mongodb/mongo/commit/fdc67557c789b20b74b442720f70a473ce606a82

Comment by Wenqin Ye [ 09/Oct/23 ]

Problem and solution

When you restart an existing replica set as an auto-bootstrapped config shard you hit this assertion which prevents you from starting a node that has ClusterRole::ConfigServer if configsvr:true is not in the repl config document. To fix this, I just removed the assertion as long as auto-bootstrapping is enabled.

Fixing the problem caused by the previous solution

But this causes another issue because it allows a user to start a replica set with mixed cluster roles (some nodes are config servers and some are shard servers). To fix this issue, I came up with a solution so that every node in the replica set eventually has the same cluster role:

  1. On startup or during replication, if a node sees a shard identity document that does not align with its cluster role it will crash as its cluster role does not match the role of the primary that inserted the shard identity document. For example, if a shard server sees a shard identity that is not for a config server then it will crash. Likewise, if a shard server sees a shard identity that is not for a shard server (i.e it has _id: "config") it will crash.

For background knowledge: the shard identity document is a document that gives information about a shard, such as the name of the shard. A config server primary inserts a shard identity document with id: "config" on step-up to primary. A shard server primary inserts a shard identity document as part of the process for adding a shard replica set to a sharded cluster (the id cannot be "config").

There will be a period of time before the shard identity document is replicated to a secondary where a secondary's cluster role is different from the primary. This is okay because before the document is replicated to a secondary, the node will not receive any sharding related operations (as the document is always inserted before sharding related operations happen). I verified that if no sharding related operations happen, the difference in cluster roles between a primary and secondary does not matter (for op-observers and for coordinators), though this can change in the future.

Generated at Thu Feb 08 06:43:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.