[SERVER-3955] mongod seg fault related to replica sets Created: 26/Sep/11  Updated: 11/Jul/16  Resolved: 04/Oct/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.0
Fix Version/s: 2.0.1, 2.1.0

Type: Bug Priority: Major - P3
Reporter: Dwight Merriman Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

See below. An unusual configuration – in addition members 1 and 2 were down at the time. Further, mongos was calling ismaster and replsetgetstatus thousands of times per second. thus it is likely a race condition?

Crash was not during a reconfig but later. Sharded environment.

PRIMARY> rs.conf()
{
        "_id" : "zzz",
        "version" : 3,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "x:27162"
                },
                {
                        "_id" : 1,
                        "host" : "x:27192",
                        "votes" : 0,
                        "arbiterOnly" : true
                },
                {
                        "_id" : 2,
                        "host" : "x:27193",
                        "votes" : 0,
                        "arbiterOnly" : true
                }

Mon Sep 26 20:13:36 [conn12] query mydb.Mapping_E13_MM ntoreturn:2 idhack:1 reslen:20 106ms
Mon Sep 26 20:13:36 [conn75] query mydb.Mapping_E13_MM ntoreturn:2 idhack:1 reslen:112 106ms
Mon Sep 26 20:13:36 [conn74] query mydb.Mapping_E13_MM ntoreturn:2 idhack:1 reslen:112 107ms
Mon Sep 26 20:14:30 [clientcursormon] mem (MB) res:45634 virt:492036 mapped:245801
*** glibc detected *** /apps/mongodb/bin/mongod: malloc(): memory corruption: 0x00007f4ed8000088 ***
======= Backtrace: =========
Mon Sep 26 20:14:54 Invalid access at address: 0
 
/lib/libc.so.6(+0x71ad6)[0x7fc6ef293ad6]
Mon Sep 26 20:14:54 Got signal: 11 (Segmentation fault).
 
/lib/libc.so.6(+0x74b6d)[0x7fc6ef296b6d]
/lib/libc.so.6(__libc_malloc+0x70)[0x7fc6ef298930]
/usr/lib/libstdc++.so.6(_Znwm+0x1d)[0x7fc6efae66bd]
/usr/lib/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x59)[0x7fc6efac2b29]
/usr/lib/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x2b)[0x7fc6efac3aeb]
/usr/lib/libstdc++.so.6(_ZNSs7reserveEm+0x3c)[0x7fc6efac405c]
/usr/lib/libstdc++.so.6(_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE8overflowEi+0xb1)[0x7fc6efabe021]
/usr/lib/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x35)[0x7fc6efac2215]
/usr/lib/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x1b5)[0x7fc6efab83b5]
/usr/lib/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x2f)[0x7fc6efab862f]
/apps/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x15b)[0xa844db]
/lib/libpthread.so.0(+0xef60)[0x7fc6efd3df60]
Mon Sep 26 20:14:54 /lib/libc.so.6(memcpy+0x2f7)[0x7fc6ef2a1a47]
Backtrace:
0xa83fc9 0xa845a0 0x7fc6efd3df60 0x7fc6efa8d173 0x67a375 0x985e77 0x973b49 0x97512f 0x95d725 0x9607b4 0x87e037 0x88485c 0xa96a46 0x635dd7 0x7fc6efd358ba 0x7fc6ef2f102d 
 /apps/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0xa83fc9]
 /apps/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0xa845a0]
 /lib/libpthread.so.0(+0xef60) [0x7fc6efd3df60]
 /usr/lib/libstdc++.so.6(_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base+0x13) [0x7fc6efa8d173]
 /apps/mongodb/bin/mongod(_ZN5mongo9MongoFile17totalMappedLengthEv+0xf5) [0x67a375]
 /apps/mongodb/bin/mongod(_ZN5mongo15CmdServerStatus3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xed7) [0x985e77]
 /apps/mongodb/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x6a9) [0x973b49]
 /apps/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6ff) [0x97512f]
 /apps/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x35) [0x95d725]
 /apps/mongodb/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xee4) [0x9607b4]
 /apps/mongodb/bin/mongod() [0x87e037]
 /apps/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c) [0x88485c]
 /apps/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xa96a46]
 /apps/mongodb/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x635dd7]
 /lib/libpthread.so.0(+0x68ba) [0x7fc6efd358ba]
 /lib/libc.so.6(clone+0x6d) [0x7fc6ef2f102d]
 
/usr/lib/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x76)[0x7fc6efac3b36]
/usr/lib/libstdc++.so.6(_ZNSsC1ERKSs+0x3c)[0x7fc6efac3e0c]
/apps/mongodb/bin/mongod(_ZNK5mongo11ReplSetImpl16_summarizeStatusERNS_14BSONObjBuilderE+0x8da)[0x7ea7aa]
/apps/mongodb/bin/mongod(_ZN5mongo19CmdReplSetGetStatus3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x118)[0x7dd448]
/apps/mongodb/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x6a9)[0x973b49]
/apps/mongodb/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6ff)[0x97512f]
/apps/mongodb/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x35)[0x95d725]
/apps/mongodb/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xee4)[0x9607b4]
/apps/mongodb/bin/mongod[0x87e037]
/apps/mongodb/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x55c)[0x88485c]
/apps/mongodb/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76)[0xa96a46]
/apps/mongodb/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287)[0x635dd7]
/lib/libpthread.so.0(+0x68ba)[0x7fc6efd358ba]



 Comments   
Comment by auto [ 09/Oct/11 ]

Author:

{u'login': u'dwight', u'name': u'dwight', u'email': u'dwight@10gen.com'}

Message: SERVER-3955 bug in diagstr::operator=. backport.
Branch: v2.0
https://github.com/mongodb/mongo/commit/05916b4e7125e1ddeeaaa3cb627fd74e670a6674

Comment by auto [ 09/Oct/11 ]

Author:

{u'login': u'dwight', u'name': u'dwight', u'email': u'dwight@10gen.com'}

Message: SERVER-3955. concurrency bug in DiagStr.
in addition, a t hread safe map impl.
Branch: v2.0
https://github.com/mongodb/mongo/commit/6728c042dd09ecb4ec3dada8e15ffb892435f1fc

Comment by auto [ 03/Oct/11 ]

Author:

{u'login': u'dwight', u'name': u'dwight', u'email': u'dwight@10gen.com'}

Message: SERVER-3955 bug in diagstr::operator=. backport.
Branch: master
https://github.com/mongodb/mongo/commit/3f05f1da08026c6e901563b93fa8d4caf478ff11

Comment by auto [ 03/Oct/11 ]

Message: SERVER-3955. concurrency bug in DiagStr.
https://github.com/mongodb/mongo/commit/85ea4bce56b58f45113df08d5450f2e64463be77

This might be it. But there might be something.

Please keep ticket open until a test is made that really hammers something running in the above scenario. Probably easy just take and existing rs test and slightly tweak.

Comment by Dwight Merriman [ 01/Oct/11 ]

"memory corruption" is the error as far as i can tell, which means it could be anything. but their repl set config changed 3 minutes earlier, changed to a weird config, and was getting over 1000 replSetGetStatus's per second – thus the repl set suspicion.

Comment by Kristina Chodorow (Inactive) [ 27/Sep/11 ]

I'm not sure I'm reading this correctly, but isn't the segfault in MongoFile::totalMappedLength()? It looks like there just happened to be a thread calling replSetGetStatus at the same time.

Generated at Thu Feb 08 03:04:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.