-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
CAR Team 2025-04-14, CAR Team 2025-04-28
-
200
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
During the add shard operation the config server has to contact the new replica set, and vice versa.
Up until now the contact from the replica set happened only after installing the shardIdentity document and trying to refresh the balancer configuration. If the replica set is not able to contact the config server, it crashes (there is an invariant) and the config server keeps retrying (since this call is already after the _mustAlwaysMakeProgress), so a manual intervention is needed at that moment.
In the old implementation (without the coordinator) the manual intervention was not needed as there was no retryability.
This ticket aims for a best effort check to find out if the replica set is able to contact the config server.
The config server sends a command with the host and port of the primary node of the csrs to the replica set to send a hello command back to the given host and port. If the connection could be established, then we proceed with the add shard, otherwise we fail.