Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.2.0
Component/s: Networking, Replication, Sharding
Labels:
- nh-240
Environment:
AWS, Ubuntu 12.04.1 LTS
2x shards (each shard consists of 2x replicas and 1x abriter)

2x app servers (each running mongos)
1x background worker (running mongos)

Operating System:
Linux
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi,

During routine operation of our mongo cluster, the mongos process on one of our app servers became unresponsive (confirmed by ssh'ing to the app server, running mongo, and running 'show dbs').

Attached is the mongos.log file from when the issue started, until after mongos was manually restarted and recovered. The machine maintained full network connectivity during this time, and DNS names were resolving in shell.

During this time, the other app server and background worker show clean mongos.logs (just acquiring and unlocking the distributed lock).

How can we prevent this happening in future? This kind of failure is critical for us, and I'm happy to help debug/diagnose it further.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongos.log
61 kB
Oct 30 2012 03:13:52 AM UTC
mongos-2.log
36 kB
Oct 30 2012 04:46:35 AM UTC
mongo_send_error.tar.gz
5.81 MB
Dec 16 2012 10:21:44 PM UTC

Assignee:: Randolph Tan
Reporter:: noizwaves
Participants:: Barrie Segal, Eliot Horowitz, noizwaves, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Oct 30 2012 03:13:52 AM UTC
Updated:: Dec 10 2014 11:19:28 PM UTC
Resolved:: May 28 2013 02:28:08 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates

PagerDuty