Problem and solution
When you restart an existing replica set as an auto-bootstrapped config shard you hit this assertion which prevents you from starting a node that has ClusterRole::ConfigServer if configsvr:true is not in the repl config document. To fix this, I just removed the assertion as long as auto-bootstrapping is enabled.
Fixing the problem caused by the previous solution
But this causes another issue because it allows a user to start a replica set with mixed cluster roles (some nodes are config servers and some are shard servers). To fix this issue, I came up with a solution so that every node in the replica set eventually has the same cluster role:
- On startup or during replication, if a node sees a shard identity document that does not align with its cluster role it will crash as its cluster role does not match the role of the primary that inserted the shard identity document. For example, if a shard server sees a shard identity that is not for a config server then it will crash. Likewise, if a shard server sees a shard identity that is not for a shard server (i.e it has _id: "config") it will crash.
For background knowledge: the shard identity document is a document that gives information about a shard, such as the name of the shard. A config server primary inserts a shard identity document with id: "config" on step-up to primary. A shard server primary inserts a shard identity document as part of the process for adding a shard replica set to a sharded cluster (the id cannot be "config").
There will be a period of time before the shard identity document is replicated to a secondary where a secondary's cluster role is different from the primary. This is okay because before the document is replicated to a secondary, the node will not receive any sharding related operations (as the document is always inserted before sharding related operations happen). I verified that if no sharding related operations happen, the difference in cluster roles between a primary and secondary does not matter (for op-observers and for coordinators), though this can change in the future.
|