-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.2.0
-
Component/s: Networking, Replication, Sharding
-
Environment:AWS, Ubuntu 12.04.1 LTS
2x shards (each shard consists of 2x replicas and 1x abriter)
2x app servers (each running mongos)
1x background worker (running mongos)
-
Linux
Hi,
During routine operation of our mongo cluster, the mongos process on one of our app servers became unresponsive (confirmed by ssh'ing to the app server, running mongo, and running 'show dbs').
Attached is the mongos.log file from when the issue started, until after mongos was manually restarted and recovered. The machine maintained full network connectivity during this time, and DNS names were resolving in shell.
During this time, the other app server and background worker show clean mongos.logs (just acquiring and unlocking the distributed lock).
How can we prevent this happening in future? This kind of failure is critical for us, and I'm happy to help debug/diagnose it further.