The ShardingNetworkConnectionHook causes a ShardNotFound error status to be returned if the HostAndPort isn't found in the ShardRegistry. This hook is run after a connection to the remote host has been established.
Status ShardingNetworkConnectionHook::validateHostImpl( const HostAndPort& remoteHost, const executor::RemoteCommandResponse& isMasterReply) { auto shard = Grid::get(getGlobalServiceContext())->shardRegistry()->getShardForHostNoReload(remoteHost); if (!shard) { return {ErrorCodes::ShardNotFound, str::stream() << "No shard found for host: " << remoteHost.toString()}; } ... }
The connection string for config shard may be updated while the sharding subsystem is initializing. (For reasons I still don't quite understand, this doesn't happen every time mongos is started, but I believe it is a necessary condition for the issue reported here to manifest.) Updating the connection string upon receiving isMaster responses from secondaries of the config shard (where the primary is still seen by the RSM as "Unknown") would remove the HostAndPort for the primary from ShardRegistry::_hostLookup. Re-adding the HostAndPort for the primary to ShardRegistry::_hostLookup happens as part of ShardingReplicaSetChangeListener::onConfirmedSet() by scheduling a task on the fixed executor. Since the ShardRegistry::_hostLookup map isn't updated synchronously, it is possible for the RSM to view the now-confirmed primary as being available for targeting primary-only reads, but for the post-connection established validate hook to fail. This leads to mongos being unable to start up successfully.
- is caused by
-
SERVER-47029 Fix race when streamable RSM updates the shard registry after topology change
- Closed
- is related to
-
SERVER-50997 Make ShardRegistry::updateReplSetHosts() refresh synchronously
- Closed
-
SERVER-43985 Make mongos pre-cache the routing table on startup
- Closed
-
SERVER-44152 Pre-warm connection pools in mongos
- Closed
-
SERVER-39818 Split RSM notification functionality into a new class
- Closed