-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.3.0-rc0
-
Component/s: None
-
Catalog and Routing
-
ALL
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
During addShard, we use an executor which does not gossip the vector clock to communicate with the replica set to be added. See SERVER-105324 for more details on this but TLDR is that doing otherwise can cause the config server to have too high of a config time after a failed addShard.
Â
This ticket is to investigate whether, now that we have a coordinator, it might be safe to use the executor with gossiping once we are in a phase which must always make progress. In theory, here we know that the shard will be added and so there is no major difference between starting gossiping here and after the commit. The benefit would be that we would have stronger causality guarantees with the writes on the shard (writing the shard identity and entering the critical section).
Â
We already have two places which mistakenly do this - the block of FCV changes on the new shard and the enter of the critical section so this is almost certainly safe in the normal case. However, we should check if it is possible to hit any edge case like we saw in SERVER-105234 and then unify the behaviors so that we don't have a mix of gossiping and non-gossiping commands.
- is caused by
-
SERVER-100403 Adding a replica set to a sharded cluster does not register existing databases in the shard-local catalog
-
- Closed
-
-
SERVER-100963 Block FCV changes on the new shard during add shard
-
- Closed
-
- is related to
-
SERVER-105324 The vectorClock may get corrupted during addShard if the added shard has a more advanced timestamps
-
- Closed
-
-
SERVER-105234 Ignore untracked files when checking codeowners
-
- Closed
-