-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
Sharding
-
ALL
It's possible to get into a hung state in mongos given a sharded cluster with a replica set shard. If the shard primary continues to allow connections but does not respond to other requests, failover will occur and a new primary will be elected normally, but sharded queries via mongos will block and not return.
This failure mode has been observed on EC2.
Can reproduce locally by :
1) Running the script included (sets up a sharded cluster with sharded collection)
2) Running mongo localhost:31000 to connect to the mongos
3) > use foo
4) > db.bar.find().itcount()
5) Stopping all data transfer for connections at the primary via iptables :
sudo /sbin/iptables -A INPUT -i lo -p tcp -m tcp --dport <mongod primary port> -m conntrack --ctstate ESTABLISHED -j DROP
6) (wait for failover)
7) > db.bar.find().itcount()
- is related to
-
SERVER-4661 Mongos doesn't detect primary change if old primary lost network connectivity
- Closed
- related to
-
SERVER-7862 Connection timeouts in mongos
- Closed