[SERVER-15920] Segmentation fault on mongod processes and mongo shell Created: 03/Nov/14  Updated: 11/Aug/15  Resolved: 14/Nov/14

Status: Closed
Project: Core Server
Component/s: Shell, Stability
Affects Version/s: 2.6.5
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Anthony Brodard Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS : Gentoo Linux
Kernel 3.9.9
GCC : tested on 4.6.4 / 4.7.3-r1 / 4.8.3
glic : tested on 2.15-r3 and 2.19-r1
Boost : tested on 1.53.0 and 1.55.0-r2


Attachments: Text File mongo_shell.txt     Text File mongo_shell_strace.txt     Text File mongod.txt    
Issue Links:
Duplicate
duplicates SERVER-12991 Segmentation fault during V8 initiali... Closed
is duplicated by SERVER-19248 Segmentation fault running query with... Closed
Operating System: Linux
Participants:

 Description   

Since we upgraded from 2.4.10 to 2.6.5, some nodes crash randomly by segmentation fault (mongod.txt). On these nodes, the mongo shell won't start and do a segfault on startup (mongo-shell.txt).
We have tried to recompile mongodb with different versions of gcc or of the gilbc, which don't solve the problem. Same result when we disable all the compilation options (use-system-yaml, use-system-boost, use-system-pcre...).
The default compilation options are available in this ebuild : http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/dev-db/mongodb/mongodb-2.6.5.ebuild?view=markup

Other nodes have the same configuration, and mongo runs perfectly.



 Comments   
Comment by Darko Luketic [ 20/Jan/15 ]

Thanks Ramon, I'll do so as soon as I find the time to.
Just a quick paxctl outpu
paxctl -v /usr/bin/mongo
PaX control v0.9
Copyright 2004,2005,2006,2007,2009,2010,2011,2012,2014 PaX Team <pageexec@freemail.hu>

  • PaX flags: -----m-x-e-r [/usr/bin/mongo]
    MPROTECT is disabled
    RANDEXEC is disabled
    EMUTRAMP is disabled
    RANDMMAP is disabled

I'll open a new ticket in the next few days.
Thanks,
Darko

Comment by Ramon Fernandez Marina [ 19/Jan/15 ]

dluketic, the information in the link you sent shows a crash in V8, so the principal suspect is some configured limitation that's preventing V8 from operating properly. Please open a new ticket posting the stack trace, the output of 'paxctl -v /usr/bin/mongo', and any other information you think may be useful for us to track the issue down.

Thanks,
Ramón.

Comment by Darko Luketic [ 19/Jan/15 ]

I'm not using a GRSEC enabled kernel and it's still happening.
https://bugs.gentoo.org/show_bug.cgi?id=536760

Comment by Ramon Fernandez Marina [ 14/Nov/14 ]

Glad to hear you found the cause of the issue anthony@1000mercis.com, and thanks to ultrabug for the assist. Closing this ticket as a duplicate of SERVER-12991.

Comment by Anthony Brodard [ 12/Nov/14 ]

Hi,

This bug happens on GRSEC enabled kernels. This is related to this : https://jira.mongodb.org/browse/SERVER-12991
As mentioned in the Gentoo issue, the ebuild have been updated.

Thanks to Ramon and Ultrabug for your help, this issue can be marked as resolved.

Comment by Ultrabug [ 09/Nov/14 ]

Hi

For reference, we're not the only ones impacted. This is the related Gentoo bug providing other details if needed : https://bugs.gentoo.org/show_bug.cgi?id=526114

Comment by Anthony Brodard [ 05/Nov/14 ]

Hi Ramon,

We have built mongodb with the debugging option :

scons -j7 --variant-dir=build --cc=x86_64-pc-linux-gnu-gcc --cxx=x86_64-pc-linux-gnu-g++ --disable-warnings-as-errors --use-system-boost --use-system-pcre --use-system-snappy --use-system-stemmer --use-system-tcmalloc --use-system-yaml --dbg=on --usev8 --ssl all

Now, the mongo shell display this error :

 ~ $ mongo
MongoDB shell version: 2.6.5
 
 
#
# Fatal error in src/third_party/v8/src/isolate.h, line 830
# CHECK(logger_ != __null) failed
#
 
 
==== Stack trace is not available ==========================
 
 
==== Isolate for the thread is not initialized =============
 
Trace/breakpoint trap

Tell me if you want the mongo and/or mongod binaries, or any other information.

Anthony

Comment by Ultrabug [ 03/Nov/14 ]

If it can help a bit, I'm attaching a strace of the mongo client segfault

FYI we tried to reduce any system wise library dependency and rely only on bundled libs shipped with the sources of mongodb. The resulting scons command used to build mongo is :

scons -j7 --variant-dir=build --cc=x86_64-pc-linux-gnu-gcc --cxx=x86_64-pc-linux-gnu-g++ --disable-warnings-as-errors --usev8 --ssl all

And it has the same effect (the strace comes from a mongo shell client built using these scons options).

Cheers

Comment by Ramon Fernandez Marina [ 03/Nov/14 ]

I misread the logs earlier on, my apologies for that. Further examination shows that the crash is triggered when running mapReduce and that all crashes are happening inside V8 (v8::internal::OS::Allocate), so my recommendation would be to revisit your build procedure first. This could be a bug in V8, or some artifact triggered by the custom build. It could also be a bad interaction with the specific system libraries used for this build.

You may also try running mongod/mongo under valgrind to see if this provides additional information. Another avenue to explore is building with debugging information; if the crash reproduces then would you be able to upload the binaries? If the answer is yes please let me know and I'll send you upload information.

I'd also recommend you post this question in the mongodb-dev google group, where your question will reach a larger audience of MongoDB developers.

Comment by Anthony Brodard [ 03/Nov/14 ]

Yes, the authentication schema have been updated to the 2.6 format. We have followed all the upgrade procedure.
This issue on mongod have been seen on 2 shards of the same cluster. This cluster is composed by 5 shards.
Each time, the primary nodes of these two shards have crashed together. Here is the logs of the crash, on the primary nodes :

Primary-rs1 :

2014-11-03T14:57:05.173+0100 [conn3226] SEVERE: Invalid access at address: 0x20
2014-11-03T14:57:05.255+0100 [conn3226] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x5be9111ae6 0x5be9111369 0x5be9111425 0x34702142460 0x5be933b41b 0x5be93eb206 0x5be933c5f9 0x5be933cf99 0x5be93c0452 0x5be9320961 0x5be93c0c81 0x5be93c49a4 0x5be9081e95 0x5be9089b7b 0x5be90722ef 0x5be8b31274 0x5be8b406cf 0x5be
8b99928 0x5be8b9a5e5 0x5be8b9b515 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x36) [0x5be9111ae6]
 /usr/bin/mongod(+0xbbb369) [0x5be9111369]
 /usr/bin/mongod(+0xbbb425) [0x5be9111425]
 /lib64/libpthread.so.0(+0x10460) [0x34702142460]
 /usr/bin/mongod(_ZN2v88internal2OS8AllocateEmPmb+0xf1) [0x5be933b41b]
 /usr/bin/mongod(_ZN2v88internal28CreateTranscendentalFunctionENS0_19TranscendentalCache4TypeE+0x3e) [0x5be93eb206]
 /usr/bin/mongod(_ZN2v88internal22init_fast_sin_functionEv+0x1e) [0x5be933c5f9]
 /usr/bin/mongod(_ZN2v88internal14POSIXPostSetUpEv+0x19) [0x5be933cf99]
 /usr/bin/mongod(_ZN2v88internal2V828InitializeOncePerProcessImplEv+0x52) [0x5be93c0452]
 /usr/bin/mongod(_ZN2v88internal12CallOnceImplEPlPFvPvES2_+0x71) [0x5be9320961]
 /usr/bin/mongod(_ZN2v88internal2V810InitializeEPNS0_12DeserializerE+0x25) [0x5be93c0c81]
 /usr/bin/mongod(_ZN2v86LockerC1EPNS_7IsolateE+0x72) [0x5be93c49a4]
 /usr/bin/mongod(_ZN5mongo7V8ScopeC1EPNS_14V8ScriptEngineE+0x41f) [0x5be9081e95]
 /usr/bin/mongod(_ZN5mongo14V8ScriptEngine11createScopeEv+0x31) [0x5be9089b7b]
 /usr/bin/mongod(_ZN5mongo12ScriptEngine14getPooledScopeERKSsS2_+0xcc3) [0x5be90722ef]
 /usr/bin/mongod(_ZN5mongo2mr5State4initEv+0x9a) [0x5be8b31274]
 /usr/bin/mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x27f) [0x5be8b406cf]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x37) [0x5be8b99928]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa27) [0x5be8b9a5e5]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x263) [0x5be8b9b515]
2014-11-03T15:08:42.572+0100 ***** SERVER RESTARTED *****
2014-11-03T15:08:42.606+0100 [initandlisten] MongoDB starting : pid=22552 port=27010 dbpath=/modb 64-bit host=modb-rs1-prim
2014-11-03T15:08:42.606+0100 [initandlisten] db version v2.6.5
2014-11-03T15:08:42.606+0100 [initandlisten] git version: nogitversion
2014-11-03T15:08:42.606+0100 [initandlisten] OpenSSL version: OpenSSL 1.0.1j 15 Oct 2014
2014-11-03T15:08:42.606+0100 [initandlisten] build info: Linux modb-rs1-prim 3.9.9-hardened #1 SMP Wed May 21 11:39:34 CEST 2014 x86_64 BOOST_LIB_VERSION=1_55
2014-11-03T15:08:42.606+0100 [initandlisten] allocator: tcmalloc
2014-11-03T15:08:42.606+0100 [initandlisten] options: { config: "/etc/modb-rs1-prim.conf", net: { port: 27010, ssl: { mode: "disabled" } }, operationProfiling: { slowOpThresholdMs: 500 }, replication: { replSetName: "modb-rs1" }, security: { keyFile: "/etc/mongodb/key.auth" }, sharding: { clusterRole: "shardsvr" }, storage: { dbPath: "/modb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/modb-rs1.log" } }
2014-11-03T15:08:42.634+0100 [initandlisten] journal dir=/modb/journal
2014-11-03T15:08:42.634+0100 [initandlisten] recover begin
 

Primary-rs2:

2014-11-03T14:57:05.159+0100 [conn2785] SEVERE: Invalid access at address: 0x20
2014-11-03T14:57:05.233+0100 [conn2785] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x59adbc1ae6 0x59adbc1369 0x59adbc1425 0x326d404b460 0x59addeb41b 0x59ade9b206 0x59addec5f9 0x59addecf99 0x59ade70452 0x59addd0961 0x59ade70c81 0x59ade749a4 0x59adb31e95 0x59adb39b7b 0x59adb222ef 0x59ad5e1274 0x59ad5f06cf 0x59a
d649928 0x59ad64a5e5 0x59ad64b515 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x36) [0x59adbc1ae6]
 /usr/bin/mongod(+0xbbb369) [0x59adbc1369]
 /usr/bin/mongod(+0xbbb425) [0x59adbc1425]
 /lib64/libpthread.so.0(+0x10460) [0x326d404b460]
 /usr/bin/mongod(_ZN2v88internal2OS8AllocateEmPmb+0xf1) [0x59addeb41b]
 /usr/bin/mongod(_ZN2v88internal28CreateTranscendentalFunctionENS0_19TranscendentalCache4TypeE+0x3e) [0x59ade9b206]
 /usr/bin/mongod(_ZN2v88internal22init_fast_sin_functionEv+0x1e) [0x59addec5f9]
 /usr/bin/mongod(_ZN2v88internal14POSIXPostSetUpEv+0x19) [0x59addecf99]
 /usr/bin/mongod(_ZN2v88internal2V828InitializeOncePerProcessImplEv+0x52) [0x59ade70452]
 /usr/bin/mongod(_ZN2v88internal12CallOnceImplEPlPFvPvES2_+0x71) [0x59addd0961]
 /usr/bin/mongod(_ZN2v88internal2V810InitializeEPNS0_12DeserializerE+0x25) [0x59ade70c81]
 /usr/bin/mongod(_ZN2v86LockerC1EPNS_7IsolateE+0x72) [0x59ade749a4]
 /usr/bin/mongod(_ZN5mongo7V8ScopeC1EPNS_14V8ScriptEngineE+0x41f) [0x59adb31e95]
 /usr/bin/mongod(_ZN5mongo14V8ScriptEngine11createScopeEv+0x31) [0x59adb39b7b]
 /usr/bin/mongod(_ZN5mongo12ScriptEngine14getPooledScopeERKSsS2_+0xcc3) [0x59adb222ef]
 /usr/bin/mongod(_ZN5mongo2mr5State4initEv+0x9a) [0x59ad5e1274]
 /usr/bin/mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x27f) [0x59ad5f06cf]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x37) [0x59ad649928]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa27) [0x59ad64a5e5]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x263) [0x59ad64b515]
2014-11-03T15:09:20.612+0100 ***** SERVER RESTARTED *****
2014-11-03T15:09:20.636+0100 [initandlisten] MongoDB starting : pid=28297 port=27010 dbpath=/modb 64-bit host=modb-rs2-prim
2014-11-03T15:09:20.641+0100 [initandlisten] db version v2.6.5
2014-11-03T15:09:20.641+0100 [initandlisten] git version: nogitversion
2014-11-03T15:09:20.641+0100 [initandlisten] OpenSSL version: OpenSSL 1.0.1h 5 Jun 2014
2014-11-03T15:09:20.641+0100 [initandlisten] build info: Linux modb-rs2-prim 3.9.9-hardened #1 SMP Wed May 21 12:26:09 CEST 2014 x86_64 BOOST_LIB_VERSION=1_53
2014-11-03T15:09:20.641+0100 [initandlisten] allocator: tcmalloc
2014-11-03T15:09:20.641+0100 [initandlisten] options: { config: "/etc/modb-rs2-prim.conf", net: { port: 27010, ssl: { mode: "disabled" } }, operationProfiling: { slowOpThresholdMs: 500 }, replication: { replSetName: "modb-rs2" }, security: { keyFile: "/etc/mongodb/key.auth" }, sharding: { clusterRole: "shardsvr" }, storage: { dbPath: "/modb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/modb-rs2.log" } }
2014-11-03T15:09:20.655+0100 [initandlisten] journal dir=/modb/journal
2014-11-03T15:09:20.655+0100 [initandlisten] recover begin

I can't send all the logs, but I can send specific parts you need after anonymization. Tell me what you need.

I'm not sure that it's a mongo data or configuration related, because a segfault occurs on the mongo shell binarie too on the mentionned servers. The shell run correctly on the other servers.
If you think that it's a coincidence, I can open an other issue only for the mongo shell bug.

Comment by Ramon Fernandez Marina [ 03/Nov/14 ]

anthony@1000mercis.com, a quick glance at the mongod stack trace hints an issue with authentication. Did you upgrade your authentication schema to the 2.6 format? Did you follow the upgrade recommendations and checklists?

It would be helpful if you could upload full logs of one of the mongod nodes from startup until the segfault.

Generated at Thu Feb 08 03:39:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.