Priority: Major - P3
Resolution: Works as Designed
Affects Version/s: 2.2.1
Fix Version/s: None
Environment:Linux RHEL 5.5
The ReplicaSetMonitor refreshes the replica set view every 10 sec or when a new operation is requested on a replica set connection that had errored out. The problem comes in when the members of the set changed such that none of the members are part of the members of what the ReplicaSetMonitor have. And the direct consequence is that there will be no way for the monitor to contact any of the new members.
The only work around for this issue is to manually edit the config.shards collection and restart all mongos.
Attached a script, test.patch, that demonstrates this problem.
Original bug report:
We had an issue whereby our mongo Config Servers didn't notice when the host names in one shard was changed.
We have two shards:
These were changed in the replicaset to:
the '-ib' interfaces are different interfaces on the host with the same name (infiniband).
The replicasets appeared to be happy and in sync, for both rs0, and rs1. However only rs0 was updated in the config servers shards collection!
The entire cluster was rebooted over the weekend. Two days later the config.shards collection did not learn the new hostnames of rs1. Also killing and restarting the config servers and mongos' since then hasn't helped.
That said the cluster appeared to work fine, until we enabled sharding on a database. At that point the mongos' and pymongo clients started failing (see attached assertions and backtraces).
The error seen in pymongo:
The error seen on the mongos:
We ended up fixing this by manually changing the rs1 location in the config.shards collection.
According to this news group posting by Eliot, this shouldn't be necessary: