[SERVER-4712] Shards crash fairly frequently when memory is low Created: 18/Jan/12 Updated: 11/Jul/16 Resolved: 23/Jan/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding, Stability |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.0.3, 2.1.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordan Frank | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 0 |
| Labels: | crash, mongos, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux macbeast 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 GNU/Linux |
||
| Attachments: |
|
| Backwards Compatibility: | Fully Compatible |
| Operating System: | Linux |
| Participants: |
| Description |
|
Another process on the box consumed a ton of memory, and the machine started swapping/thrashing. About a minute later, the shard crashed. I've attached the log file (cleaning out the irrelevant stuff). I would assume that even when the memory is low, the server shouldn't crash. This has happened 5 times in the last week. I'm sharing the box with a memory hog, but I still don't think that my database should crash if another user's process goes out of control (or maybe it should, please correct me if I'm wrong). There are two files, the mongodb1.log one is what I saw in the shard's log, and mongos.log is what I saw on the mongos log. The config server is on port 27030, the shard that crashed was listening on port 27021. On Wednesday at 10:50AM, I tried to kill the shard process (there's a line in the mongodb1.log). That didn't do anything, so I resorted to kill -9. |
| Comments |
| Comment by auto [ 01/Feb/12 ] |
|
Author: {u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}Message: |
| Comment by auto [ 23/Jan/12 ] |
|
Author: {u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}Message: |
| Comment by Greg Studer [ 23/Jan/12 ] |
|
This is an issue in how we use exceptions in the distributed lock pinger, caused by a connectivity problem to the config server - you're hosting the config server locally? The workaround is to host your config servers on separate hosts so the thrashing won't affect them, but there'll definitely be a fix in the next incremental release. |