[SERVER-4441] Got Signal: 11 (Segmentation Fault) under heavy load Created: 06/Dec/11  Updated: 11/Jul/16  Resolved: 02/Jan/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Piero Sartini Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

SunOS 5.11 oi_148 i86pc i386 i86pc, 64 GB RAM


Issue Links:
Depends
depends on SERVER-4350 Segmentation fault on replica recovery Closed
Related
Operating System: Solaris
Participants:

 Description   

We had this problem before occasionally. But now we started to shard a collection with about 120GB data and it is not possible to keep the balancer active.

After some time mongod instances (master + 2 slaves) on the first shard begin to SegFault every few minutes.
I've disabled the balancer and mongod instances keep running again.

This happens with 2.0.1 as well as 2.0.2-rc1. My impression is that high load leads to these SegFaults.

Sun Dec 4 17:20:37 Invalid access at address: 0xfffffd59fedb32cc

Sun Dec 4 17:20:38 Got signal: 11 (Segmentation Fault).

Sun Dec 4 17:20:38 Backtrace:

Logstream::get called in uninitialized state
Sun Dec 4 17:20:41 [conn49] end connection 192.168.151.20:39137
Logstream::get called in uninitialized state
Sun Dec 4 17:20:41 [initandlisten] connection accepted from 192.168.151.20:37360 #50
Logstream::get called in uninitialized state
Sun Dec 4 17:20:41 ERROR: Client::~Client _context should be null but is not; client:rsSync
Logstream::get called in uninitialized state
Sun Dec 4 17:20:41 ERROR: Client::shutdown not called: rsSync



 Comments   
Comment by Kevin Krauss [ 12/Feb/14 ]

Still getting this!

Wed Feb 12 11:21:49.327 [conn17] auth: couldn't find user yogi@yogi_berra, yogi_berra.system.users
Wed Feb 12 11:36:48.282 [conn15] end connection 127.0.0.1:49873 (1 connection now open)
Wed Feb 12 11:37:22.812 [conn17] end connection 127.0.0.1:49977 (0 connections now open)
Wed Feb 12 14:32:36.572 [initandlisten] connection accepted from 127.0.0.1:51840 #18 (1 connection now open)
Wed Feb 12 14:52:18.262 Invalid access at address: 0x10 from thread: conn18

Wed Feb 12 14:52:18.262 Got signal: 11 (Segmentation fault: 11).

Wed Feb 12 14:52:18.282 Backtrace:
0x1011c19e0 0x100cd227d 0x100cd25b8 0x7fff92d6a5aa 0x10f51b368 0x1012f4097 0x1013c1699 0x1013c1501 0x10117b5cb 0x10117b48f 0x10117636a 0x1011757cf 0x100e1d675 0x100e270a7 0x100e4b055 0x100e4c013 0x100e4cdf6 0x100f6104d 0x100f67468 0x100f0492a
0 mongod 0x00000001011c19e0 _ZN5mongo15printStackTraceERSo + 64
1 mongod 0x0000000100cd227d _ZN5mongo10abruptQuitEi + 397
2 mongod 0x0000000100cd25b8 ZN5mongo24abruptQuitWithAddrSignalEiP9_siginfoPv + 344
3 libsystem_platform.dylib 0x00007fff92d6a5aa _sigtramp + 26
4 ??? 0x000000010f51b368 0x0 + 4551979880
5 mongod 0x00000001012f4097 _ZN2v88internal15DeoptimizerDataD1Ev + 55
6 mongod 0x00000001013c1699 _ZN2v88internal7Isolate6DeinitEv + 105
7 mongod 0x00000001013c1501 _ZN2v88internal7Isolate8TearDownEv + 81
8 mongod 0x000000010117b5cb _ZN5mongo7V8ScopeD2Ev + 267
9 mongod 0x000000010117b48f _ZN5mongo7V8ScopeD0Ev + 15
10 mongod 0x000000010117636a _ZN5mongo11PooledScopeD2Ev + 842
11 mongod 0x00000001011757cf _ZN5mongo11PooledScopeD0Ev + 15
12 mongod 0x0000000100e1d675 _ZN5mongo2mr5StateD2Ev + 341
13 mongod 0x0000000100e270a7 _ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 6455
14 mongod 0x0000000100e4b055 _ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 37
15 mongod 0x0000000100e4c013 _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 2915
16 mongod 0x0000000100e4cdf6 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 886
17 mongod 0x0000000100f6104d _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 45
18 mongod 0x0000000100f67468 ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1 + 1112
19 mongod 0x0000000100f0492a _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 1338

Comment by Kevin Krauss [ 12/Dec/13 ]

This is happening for me often on my development machine. MacBook Pro, running OSX 10.9.
I am using mongo version 2.4.8.
Could be related to this bug https://jira.mongodb.org/browse/SERVER-11671, because started happening when I began using map reduce functions.
My first couple of tries writing the functions weren't perfect javascript so I had errors and it started crashing.
Now it crashes about every 5-10 requests.

var map = function() {
emit(this.error_message,

{ count: 1, date: this.created_at }

);
};

var reduce = function(key, values) {
var result =

{count: 0, dates: []}

;
values.forEach(function(value){
if (value.date)

{ result.count += value.count; result.dates.push(value.date); }

});
return result;
}

db.runCommand({
mapReduce: "caught_exceptions",
map: map,
reduce: reduce,
out:

{ inline: 1 }

,
query: {
error_message:

{ $exists: true, $ne: "" }

}
})

Thu Dec 12 08:48:02.963 [initandlisten] connection accepted from 127.0.0.1:61765 #8 (2 connections now open)
Thu Dec 12 08:48:03.040 Invalid access at address: 0x10 from thread: conn8

Thu Dec 12 08:48:03.040 Got signal: 11 (Segmentation fault: 11).

Thu Dec 12 08:48:03.046 Backtrace:
0x10aef69e0 0x10aa0727d 0x10aa075b8 0x7fff907e65aa 0x10b205ed1 0x10b029097 0x10b0f6699 0x10b0f6501 0x10aeb05cb 0x10aeb048f 0x10aeab36a 0x10aeaa7cf 0x10ab52675 0x10ab5c0a7 0x10ab80055 0x10ab81013 0x10ab81df6 0x10ac9604d 0x10ac9c468 0x10ac3992a
0 mongod 0x000000010aef69e0 _ZN5mongo15printStackTraceERSo + 64
1 mongod 0x000000010aa0727d _ZN5mongo10abruptQuitEi + 397
2 mongod 0x000000010aa075b8 ZN5mongo24abruptQuitWithAddrSignalEiP9_siginfoPv + 344
3 libsystem_platform.dylib 0x00007fff907e65aa _sigtramp + 26
4 mongod 0x000000010b205ed1 _ZN2v88internal9StubCache17ComputeCallGlobalEiNS0_4Code4KindEiNS0_6HandleINS0_6StringEEENS4_INS0_8JSObjectEEENS4_INS0_12GlobalObjectEEENS4_INS0_20JSGlobalPropertyCellEEENS4_INS0_10JSFunctionEEE + 641
5 mongod 0x000000010b029097 _ZN2v88internal15DeoptimizerDataD1Ev + 55
6 mongod 0x000000010b0f6699 _ZN2v88internal7Isolate6DeinitEv + 105
7 mongod 0x000000010b0f6501 _ZN2v88internal7Isolate8TearDownEv + 81
8 mongod 0x000000010aeb05cb _ZN5mongo7V8ScopeD2Ev + 267
9 mongod 0x000000010aeb048f _ZN5mongo7V8ScopeD0Ev + 15
10 mongod 0x000000010aeab36a _ZN5mongo11PooledScopeD2Ev + 842
11 mongod 0x000000010aeaa7cf _ZN5mongo11PooledScopeD0Ev + 15
12 mongod 0x000000010ab52675 _ZN5mongo2mr5StateD2Ev + 341
13 mongod 0x000000010ab5c0a7 _ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 6455
14 mongod 0x000000010ab80055 _ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 37
15 mongod 0x000000010ab81013 _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 2915
16 mongod 0x000000010ab81df6 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 886
17 mongod 0x000000010ac9604d _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 45
18 mongod 0x000000010ac9c468 ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1 + 1112
19 mongod 0x000000010ac3992a _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 1338

Comment by Sam Kottler [ 30/May/12 ]

I have seen a similar issue on our infrastructure. Here is the complete stack trace:

Thu May 24 03:20:28 Backtrace:
0xa89b19 0x7ffaec686af0 0x7ffaec686a75 0x7ffaec68a5c0 0x87ffb7 0x76134b 0x7615bd 0x761d4b 0xaa4560 0x7ffaed18a9ca 0x7ffaec73970d
/ebs/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa89b19]
/lib/libc.so.6(+0x33af0) [0x7ffaec686af0]
/lib/libc.so.6(gsignal+0x35) [0x7ffaec686a75]
/lib/libc.so.6(abort+0x180) [0x7ffaec68a5c0]
/ebs/mongodb/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x47) [0x87ffb7]
/ebs/mongodb/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xcb) [0x76134b]
/ebs/mongodb/bin/mongod() [0x7615bd]
/ebs/mongodb/bin/mongod(_ZN5mongo3dur9durThreadEv+0x10b) [0x761d4b]
/ebs/mongodb/bin/mongod(thread_proxy+0x80) [0xaa4560]
/lib/libpthread.so.0(+0x69ca) [0x7ffaed18a9ca]
/lib/libc.so.6(clone+0x6d) [0x7ffaec73970d]

Thu May 24 03:20:28 [conn1632100] insert mq.mq_coll 129ms
Thu May 24 03:20:28 [conn1632110] query kv.kv_collection ntoreturn:1 nscanned:2 nreturned:1 reslen:97 5731ms
Thu May 24 03:20:28 [conn1634789] query action.pending_coll ntoreturn:50 reslen:20 483ms
Thu May 24 03:20:28 [conn1632131] query kv.kv_collection ntoreturn:1 nscanned:2 nreturned:1 reslen:97 15285ms
Logstream::get called in uninitialized state
Thu May 24 03:20:29 [conn1632100] insert mq.mq_coll 114ms
Logstream::get called in uninitialized state
Thu May 24 03:20:29 [conn1628299] insert mevent_collection.mevents 273ms
Thu May 24 03:20:29 Invalid access at address: 0x10000001f

Thu May 24 03:20:29 Got signal: 11 (Segmentation fault).

Thu May 24 03:20:29 Backtrace:
0xa89b19 0xa8a0f0 0x7ffaed1938f0 0x797fbb 0x9736e1 0x8c51b6 0x8c53e1 0x8c54ea 0x8d49af 0x8d50e6 0x8d8850 0x8da333 0x8db607 0x964369 0x882407 0x888c2c 0xa9c576 0x638937 0x7ffaed18a9ca 0x7ffaec73970d
/ebs/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa89b19]
/ebs/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0xa8a0f0]
/lib/libpthread.so.0(+0xf8f0) [0x7ffaed1938f0]
/ebs/mongodb/bin/mongod(_ZN5mongo12ClientCursor16recoverFromYieldERKNS0_9YieldDataE+0x3b) [0x797fbb]
/ebs/mongodb/bin/mongod(_ZN5mongo11UserQueryOp16recoverFromYieldEv+0x191) [0x9736e1]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet6Runner18recoverFromYieldOpERNS_7QueryOpE+0x56) [0x8c51b6]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet6Runner16recoverFromYieldEv+0x21) [0x8c53e1]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet6Runner8mayYieldEv+0xca) [0x8c54ea]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet6Runner4nextEv+0x2f) [0x8d49af]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet6Runner22runUntilFirstCompletesEv+0x56) [0x8d50e6]
/ebs/mongodb/bin/mongod(_ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE+0x50) [0x8d8850]
/ebs/mongodb/bin/mongod(_ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE+0x523) [0x8da333]
/ebs/mongodb/bin/mongod(_ZN5mongo16MultiPlanScanner5runOpERNS_7QueryOpE+0x17) [0x8db607]
/ebs/mongodb/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0xa79) [0x964369]
/ebs/mongodb/bin/mongod() [0x882407]
/ebs/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c) [0x888c2c]
/ebs/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xa9c576]
/ebs/mongodb/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x638937]
/lib/libpthread.so.0(+0x69ca) [0x7ffaed18a9ca]
/lib/libc.so.6(clone+0x6d) [0x7ffaec73970d]

Logstream::get called in uninitialized state
Thu May 24 03:20:29 [conn1634767] ERROR: Client::~Client _context should be null but is not; client:conn
Logstream::get called in uninitialized state
Thu May 24 03:20:29 [conn1634767] ERROR: Client::shutdown not called: conn
Thu May 24 03:20:29 Invalid access at address: 0

Thu May 24 03:20:29 Got signal: 11 (Segmentation fault).

Thu May 24 03:20:29 Backtrace:
0xa89b19 0xa8a0f0 0x7ffaed1938f0 0x7ffaec68c1b5 0xa89d6e 0x7ffaec686af0 0x7ffaec686a75 0x7ffaec68a5c0 0x87ffb7 0x76134b 0x7615bd 0x761d4b 0xaa4560 0x7ffaed18a9ca 0x7ffaec73970d
/ebs/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa89b19]
/ebs/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0xa8a0f0]
/lib/libpthread.so.0(+0xf8f0) [0x7ffaed1938f0]
/lib/libc.so.6(exit+0x35) [0x7ffaec68c1b5]
/ebs/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x5ee) [0xa89d6e]
/lib/libc.so.6(+0x33af0) [0x7ffaec686af0]
/lib/libc.so.6(gsignal+0x35) [0x7ffaec686a75]
/lib/libc.so.6(abort+0x180) [0x7ffaec68a5c0]
/ebs/mongodb/bin/mongod(_ZN5mongo10mongoAbortEPKc+0x47) [0x87ffb7]
/ebs/mongodb/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0xcb) [0x76134b]
/ebs/mongodb/bin/mongod() [0x7615bd]
/ebs/mongodb/bin/mongod(_ZN5mongo3dur9durThreadEv+0x10b) [0x761d4b]
/ebs/mongodb/bin/mongod(thread_proxy+0x80) [0xaa4560]
/lib/libpthread.so.0(+0x69ca) [0x7ffaed18a9ca]
/lib/libc.so.6(clone+0x6d) [0x7ffaec73970d]

pure virtual method called
Thu May 24 03:20:29 terminate() called, printing stack:
0xa8903d 0x7ffaecf3ad16 0x7ffaecf3ad43 0x7ffaecf3b61f 0x976674 0x9783b5 0x961745 0x964ad1 0x882407 0x888c2c 0xa9c576 0x638937 0x7ffaed18a9ca 0x7ffaec73970d
/ebs/mongodb/bin/mongod(_ZN5mongo11myterminateEv+0x4d) [0xa8903d]
/usr/lib/libstdc++.so.6(+0xcad16) [0x7ffaecf3ad16]
/usr/lib/libstdc++.so.6(+0xcad43) [0x7ffaecf3ad43]
/usr/lib/libstdc++.so.6(+0xcb61f) [0x7ffaecf3b61f]
/ebs/mongodb/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x5c4) [0x976674]
/ebs/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x705) [0x9783b5]
/ebs/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x35) [0x961745]
/ebs/mongodb/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x11e1) [0x964ad1]
/ebs/mongodb/bin/mongod() [0x882407]
/ebs/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c) [0x888c2c]
/ebs/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xa9c576]
/ebs/mongodb/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x638937]
/lib/libpthread.so.0(+0x69ca) [0x7ffaed18a9ca]
/lib/libc.so.6(clone+0x6d) [0x7ffaec73970d]
Thu May 24 03:20:29 Got signal: 6 (Aborted).

Thu May 24 03:20:29 Backtrace:
0xa89b19 0x7ffaec686af0 0x7ffaec686a75 0x7ffaec68a5c0 0xa8910b 0x7ffaecf3ad16 0x7ffaecf3ad43 0x7ffaecf3b61f 0x976674 0x9783b5 0x961745 0x964ad1 0x882407 0x888c2c 0xa9c576 0x638937 0x7ffaed18a9ca 0x7ffaec73970d
/ebs/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa89b19]
/lib/libc.so.6(+0x33af0) [0x7ffaec686af0]
/lib/libc.so.6(gsignal+0x35) [0x7ffaec686a75]
/lib/libc.so.6(abort+0x180) [0x7ffaec68a5c0]
/ebs/mongodb/bin/mongod(_ZN5mongo11myterminateEv+0x11b) [0xa8910b]
/usr/lib/libstdc++.so.6(+0xcad16) [0x7ffaecf3ad16]
/usr/lib/libstdc++.so.6(+0xcad43) [0x7ffaecf3ad43]
/usr/lib/libstdc++.so.6(+0xcb61f) [0x7ffaecf3b61f]
/ebs/mongodb/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x5c4) [0x976674]
/ebs/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x705) [0x9783b5]
/ebs/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x35) [0x961745]
/ebs/mongodb/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x11e1) [0x964ad1]
/ebs/mongodb/bin/mongod() [0x882407]
/ebs/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c) [0x888c2c]
/ebs/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xa9c576]
/ebs/mongodb/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x638937]
/lib/libpthread.so.0(+0x69ca) [0x7ffaed18a9ca]
/lib/libc.so.6(clone+0x6d) [0x7ffaec73970d]

Comment by Greg Studer [ 02/Jan/12 ]

Issue isn't a race condition per-se, but depends heavily on the exact data being replicated and timing between hosts. Definitely reopen if you continue to see in later versions.

Comment by Greg Studer [ 13/Dec/11 ]

I suspect the issue was a race condition, but will verify with eliot.

Comment by Piero Sartini [ 13/Dec/11 ]

After switching the OS to Debian Squeeze (linux 2.6.32-5-amd64) we cannot reproduce the error.
It's now possible to enable the balancer without running into it.

Same hardware, same database and similar load.

Comment by Greg Studer [ 07/Dec/11 ]

As soon as possible on our end - we're still testing some final stuff there, and MongoSV is going on right now, so probably right after.

Comment by Piero Sartini [ 07/Dec/11 ]

We can test to rollback to 2.0.0 tomorrow to be sure it is SERVER-4350.
Since there is no time pressure to re-enable balancing we will continue to use 2.0.1 afterward and wait for 2.0.2. Any hint when it will be available?

Comment by Greg Studer [ 06/Dec/11 ]

Actually this looks like SERVER-4350 - one way to check would be to roll back to 2.0.0 mongod (though probably best only to test, since 2.0.1 contains other important fixes). 2.0.2 will have a fix for 4350, but it isn't fully integrated and tested in the rc yet.

Comment by Piero Sartini [ 06/Dec/11 ]

I've added the december log of one slave to SUPPORT-186. If you need master log as well you can get it, but there is a warning for each insert (bad shard config), so its bigger and takes some time to get.

The MMS group is "randombit GmbH"

Comment by Greg Studer [ 06/Dec/11 ]

If you'd like to open a ticket in the SUPPORT/Community Private group, only 10gen will be able to see it and the attachments - just mention this ticket in the description and ideally link it. What's the mms group?

In the meantime you can send the logs to greg@10gen.com, but it's hard to track issues that way so we should keep the discussion here.

Comment by Piero Sartini [ 06/Dec/11 ]

It is a 64 bit machine:
{{
root@machine:~# isainfo -kv
64-bit amd64 kernel modules
}}

Unfortunately I can't attach the log file in public, but could make it available to you (10gen).
May I send it to your eMail? Could not find an option to restrict attachements in Jira.

MMS agent is active since today as well.

Comment by Eliot Horowitz (Inactive) [ 06/Dec/11 ]

Can you attach the full mongod log?

Can you try a 64-bit machine as my first guess is mongod is running out of ram and its not being handled.

Generated at Thu Feb 08 03:06:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.