[SERVER-4441] Got Signal: 11 (Segmentation Fault) under heavy load Created: 06/Dec/11 Updated: 11/Jul/16 Resolved: 02/Jan/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Piero Sartini | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
SunOS 5.11 oi_148 i86pc i386 i86pc, 64 GB RAM |
||
| Issue Links: |
|
||||||||||||
| Operating System: | Solaris | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We had this problem before occasionally. But now we started to shard a collection with about 120GB data and it is not possible to keep the balancer active. After some time mongod instances (master + 2 slaves) on the first shard begin to SegFault every few minutes. This happens with 2.0.1 as well as 2.0.2-rc1. My impression is that high load leads to these SegFaults.
|
| Comments |
| Comment by Kevin Krauss [ 12/Feb/14 ] |
|
Still getting this! Wed Feb 12 11:21:49.327 [conn17] auth: couldn't find user yogi@yogi_berra, yogi_berra.system.users Wed Feb 12 14:52:18.262 Got signal: 11 (Segmentation fault: 11). Wed Feb 12 14:52:18.282 Backtrace: |
| Comment by Kevin Krauss [ 12/Dec/13 ] |
|
This is happening for me often on my development machine. MacBook Pro, running OSX 10.9. var map = function() { ); var reduce = function(key, values) { ; }); db.runCommand({ , } Thu Dec 12 08:48:02.963 [initandlisten] connection accepted from 127.0.0.1:61765 #8 (2 connections now open) Thu Dec 12 08:48:03.040 Got signal: 11 (Segmentation fault: 11). Thu Dec 12 08:48:03.046 Backtrace: |
| Comment by Sam Kottler [ 30/May/12 ] |
|
I have seen a similar issue on our infrastructure. Here is the complete stack trace: Thu May 24 03:20:28 Backtrace: Thu May 24 03:20:28 [conn1632100] insert mq.mq_coll 129ms Thu May 24 03:20:29 Got signal: 11 (Segmentation fault). Thu May 24 03:20:29 Backtrace: Logstream::get called in uninitialized state Thu May 24 03:20:29 Got signal: 11 (Segmentation fault). Thu May 24 03:20:29 Backtrace: pure virtual method called Thu May 24 03:20:29 Backtrace: |
| Comment by Greg Studer [ 02/Jan/12 ] |
|
Issue isn't a race condition per-se, but depends heavily on the exact data being replicated and timing between hosts. Definitely reopen if you continue to see in later versions. |
| Comment by Greg Studer [ 13/Dec/11 ] |
|
I suspect the issue was a race condition, but will verify with eliot. |
| Comment by Piero Sartini [ 13/Dec/11 ] |
|
After switching the OS to Debian Squeeze (linux 2.6.32-5-amd64) we cannot reproduce the error. Same hardware, same database and similar load. |
| Comment by Greg Studer [ 07/Dec/11 ] |
|
As soon as possible on our end - we're still testing some final stuff there, and MongoSV is going on right now, so probably right after. |
| Comment by Piero Sartini [ 07/Dec/11 ] |
|
We can test to rollback to 2.0.0 tomorrow to be sure it is |
| Comment by Greg Studer [ 06/Dec/11 ] |
|
Actually this looks like |
| Comment by Piero Sartini [ 06/Dec/11 ] |
|
I've added the december log of one slave to SUPPORT-186. If you need master log as well you can get it, but there is a warning for each insert (bad shard config), so its bigger and takes some time to get. The MMS group is "randombit GmbH" |
| Comment by Greg Studer [ 06/Dec/11 ] |
|
If you'd like to open a ticket in the SUPPORT/Community Private group, only 10gen will be able to see it and the attachments - just mention this ticket in the description and ideally link it. What's the mms group? In the meantime you can send the logs to greg@10gen.com, but it's hard to track issues that way so we should keep the discussion here. |
| Comment by Piero Sartini [ 06/Dec/11 ] |
|
It is a 64 bit machine: Unfortunately I can't attach the log file in public, but could make it available to you (10gen). MMS agent is active since today as well. |
| Comment by Eliot Horowitz (Inactive) [ 06/Dec/11 ] |
|
Can you attach the full mongod log? Can you try a 64-bit machine as my first guess is mongod is running out of ram and its not being handled. |