[SERVER-4712] Shards crash fairly frequently when memory is low Created: 18/Jan/12  Updated: 11/Jul/16  Resolved: 23/Jan/12

Status: Closed
Project: Core Server
Component/s: Sharding, Stability
Affects Version/s: 2.0.2
Fix Version/s: 2.0.3, 2.1.0

Type: Bug Priority: Major - P3
Reporter: Jordan Frank Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: crash, mongos, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux macbeast 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 GNU/Linux
Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41


Attachments: File mongodb1.log     File mongos.log    
Backwards Compatibility: Fully Compatible
Operating System: Linux
Participants:

 Description   

Another process on the box consumed a ton of memory, and the machine started swapping/thrashing. About a minute later, the shard crashed. I've attached the log file (cleaning out the irrelevant stuff). I would assume that even when the memory is low, the server shouldn't crash.

This has happened 5 times in the last week. I'm sharing the box with a memory hog, but I still don't think that my database should crash if another user's process goes out of control (or maybe it should, please correct me if I'm wrong).

There are two files, the mongodb1.log one is what I saw in the shard's log, and mongos.log is what I saw on the mongos log. The config server is on port 27030, the shard that crashed was listening on port 27021.

On Wednesday at 10:50AM, I tried to kill the shard process (there's a line in the mongodb1.log). That didn't do anything, so I resorted to kill -9.



 Comments   
Comment by auto [ 01/Feb/12 ]

Author:

{u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-4712 check that query results are valid before using in case of conn error
Branch: v2.0
https://github.com/mongodb/mongo/commit/9a09c9f1af58f8c3f78848cf4e158c082d1bf6be

Comment by auto [ 23/Jan/12 ]

Author:

{u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-4712 check that query results are valid before using in case of conn error
Branch: master
https://github.com/mongodb/mongo/commit/29948a70afd735d7e52f56b9804e3cb66828aae2

Comment by Greg Studer [ 23/Jan/12 ]

This is an issue in how we use exceptions in the distributed lock pinger, caused by a connectivity problem to the config server - you're hosting the config server locally? The workaround is to host your config servers on separate hosts so the thrashing won't affect them, but there'll definitely be a fix in the next incremental release.

Generated at Thu Feb 08 03:06:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.