[SERVER-24386] 3.2.6 Segmentation Fault after a network problem Created: 03/Jun/16  Updated: 26/Sep/18  Resolved: 07/Nov/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Stability
Affects Version/s: 3.2.6
Fix Version/s: 3.2.11

Type: Bug Priority: Critical - P2
Reporter: Šimun Mikecin Assignee: Charlie Swanson
Resolution: Done Votes: 3
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar.xz     File mongod.log.xz     Text File rs.conf.txt    
Issue Links:
Duplicate
is duplicated by SERVER-26868 mongodb 3.2.10 crash (Segmentation fa... Closed
is duplicated by SERVER-27106 Segfaults errors Closed
is duplicated by SERVER-30525 Segmentation Fault Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 16 (06/24/16), Query 17 (07/15/16), Query 2016-11-21
Participants:
Case:

 Description   

Configuration is a replica set with one primary, one secondary and an arbiter.
Primary and secondary are running version 3.2.6 on CentOS 6 64-bit.
Arbiter is running version 2.6.9 on CentOS 6 32-bit.

After some strange network problem secondary node's mongod process became primary and after cca 5min it crashed with a Segmentation fault.
I have attached log file, diagnostic.data folder and rs.conf() output.



 Comments   
Comment by Githook User [ 07/Nov/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-24386 Use a valid OperationContext when killing $lookup's cursor
Branch: v3.2
https://github.com/mongodb/mongo/commit/fcf04452bcfb0b169380743f7041308f397e2196

Comment by Max Hirschhorn [ 19/Aug/16 ]

The issue described in this ticket has been addressed by the changes from SERVER-25005 (in 3.3.11). I'm updating the fixVersion of this ticket to "3.2 Required" to reflect that it tracks the work needed to resolve this issue on the 3.2 branch.

Comment by Ramon Fernandez Marina [ 03/Jun/16 ]

simun.mikecin@me.com, we've been able to identify the root cause of the problem and are working on a fix.

This bug was triggered by an aggregation command; from your logs:

2016-06-02T16:36:45.451+0200 I COMMAND  [conn172829] command live.campaigns command: aggregate { aggregate: "campaigns", pipeline: [ { $lookup: { from: "tasks", localField: "_id", foreignField: "campaignId", as: "tasks" } } ], cursor: {} } keyUpdates:0 writeConflicts:0 numYields:0 reslen:2575499 locks:{ Global: { acquireCount: { r: 2958 } }, Database: { acquireCount: { r: 1479 } }, Collection: { acquireCount: { r: 1479 } } } protocol:op_command 161ms

We'll post updates on this ticket as they happen. Thanks for reporting this bug!

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 03/Jun/16 ]

And here's the parsed stack trace (the mongod binary is here, and debug symbols are here):

addr2line -ipfC cpp -e mongodb-linux-x86_64-rhel62-3.2.6/mongod.debug 0x13336a2 0x13325d9 0x1332958 0x7f108a6fc710 0xd149b1 0xbbe83a 0xa018a9 0xa14615 0xa148cb 0xa14cd1 0xd64b20 0xd64c41 0xdce1c6 0xdce275 0xdce363 0xb62670 0xb62731 0xe2eecf 0xafd473 0xadfee3 0xae059c 0xae081c 0xcc6561 0xccd763 0x99ad3c 0x12df01d 0x7f108a6f49d1 0x7f108a441b6d
?? ??:0
mongo::printStackTrace(std::ostream&) at /data/mci/src/src/mongo/util/stacktrace_posix.cpp:172
printSignalAndBacktrace at /data/mci/src/src/mongo/util/signal_handlers_synchronous.cpp:182
abruptQuitWithAddrSignal at /data/mci/src/src/mongo/util/signal_handlers_synchronous.cpp:277
?? ??:0
mongo::OperationContext::getClient() const at /data/mci/src/src/mongo/db/operation_context.cpp:46
DirectClientScope at /data/mci/src/src/mongo/db/dbdirectclient.cpp:60
 (inlined by) mongo::DBDirectClient::say(mongo::Message&, bool, std::string*) at /data/mci/src/src/mongo/db/dbdirectclient.cpp:141
mongo::Message::reset() at /data/mci/src/src/mongo/util/net/message.h:504
 (inlined by) ~Message at /data/mci/src/src/mongo/util/net/message.h:420
 (inlined by) mongo::DBClientBase::killCursor(long long) at /data/mci/src/src/mongo/client/dbclient.cpp:1271
mongo::DBClientCursor::kill() at /data/mci/src/src/mongo/client/dbclientcursor.cpp:517 (discriminator 1)
std::string::_M_data() const at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:293
 (inlined by) std::string::_M_rep() const at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:301
 (inlined by) ~basic_string at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:539
 (inlined by) mongo::DBClientCursor::~DBClientCursor() at /data/mci/src/src/mongo/client/dbclientcursor.cpp:512
mongo::DBClientCursor::~DBClientCursor() at /data/mci/src/src/mongo/client/dbclientcursor.cpp:514
~intrusive_ptr at /data/mci/src/src/third_party/boost-1.56.0/boost/smart_ptr/intrusive_ptr.hpp:97
 (inlined by) mongo::DocumentSourceLookUp::~DocumentSourceLookUp() at /data/mci/src/src/mongo/db/pipeline/document_source.h:1262
mongo::DocumentSourceLookUp::~DocumentSourceLookUp() at /data/mci/src/src/mongo/db/pipeline/document_source.h:1262
__destroy<boost::intrusive_ptr<mongo::DocumentSource>*> at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_construct.h:102
 (inlined by) _Destroy<boost::intrusive_ptr<mongo::DocumentSource>*> at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_construct.h:126
 (inlined by) _Destroy<boost::intrusive_ptr<mongo::DocumentSource>*, boost::intrusive_ptr<mongo::DocumentSource> > at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_construct.h:151
 (inlined by) std::deque<boost::intrusive_ptr<mongo::DocumentSource>, std::allocator<boost::intrusive_ptr<mongo::DocumentSource> > >::_M_destroy_data_aux(std::_Deque_iterator<boost::intrusive_ptr<mongo::DocumentSource>, boost::intrusive_ptr<mongo::DocumentSource>&, boost::intrusive_ptr<mongo::DocumentSource>*>, std::_Deque_iterator<boost::intrusive_ptr<mongo::DocumentSource>, boost::intrusive_ptr<mongo::DocumentSource>&, boost::intrusive_ptr<mongo::DocumentSource>*>) at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/deque.tcc:813
~_Deque_base at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_deque.h:563
 (inlined by) std::deque<boost::intrusive_ptr<mongo::DocumentSource>, std::allocator<boost::intrusive_ptr<mongo::DocumentSource> > >::~deque() at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_deque.h:918
~IntrusiveCounter at /data/mci/src/src/mongo/util/intrusive_counter.h:63
 (inlined by) ~IntrusiveCounterUnsigned at /data/mci/src/src/mongo/util/intrusive_counter.h:77
 (inlined by) ~Pipeline at /data/mci/src/src/mongo/db/pipeline/pipeline.h:55
 (inlined by) mongo::Pipeline::~Pipeline() at /data/mci/src/src/mongo/db/pipeline/pipeline.h:55
~PlanStage at /data/mci/src/src/mongo/db/exec/plan_stage.h:110
 (inlined by) mongo::PipelineProxyStage::~PipelineProxyStage() at /data/mci/src/src/mongo/db/exec/pipeline_proxy.h:45
mongo::PipelineProxyStage::~PipelineProxyStage() at /data/mci/src/src/mongo/db/exec/pipeline_proxy.h:45
~unique_ptr at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/unique_ptr.h:183
 (inlined by) mongo::PlanExecutor::~PlanExecutor() at /data/mci/src/src/mongo/db/query/plan_executor.cpp:214
std::default_delete<mongo::PlanExecutor>::operator()(mongo::PlanExecutor*) const at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/unique_ptr.h:67
 (inlined by) ~unique_ptr at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/unique_ptr.h:184
 (inlined by) mongo::ClientCursor::~ClientCursor() at /data/mci/src/src/mongo/db/clientcursor.cpp:134
mongo::CursorManager::eraseCursor(mongo::OperationContext*, long long, bool) at /data/mci/src/src/mongo/db/catalog/cursor_manager.cpp:538 (discriminator 1)
mongo::Status::code() const at /data/mci/src/src/mongo/base/status-inl.h:72
 (inlined by) mongo::GlobalCursorIdCache::eraseCursor(mongo::OperationContext*, long long, bool) at /data/mci/src/src/mongo/db/catalog/cursor_manager.cpp:236
mongo::CursorManager::eraseCursorGlobalIfAuthorized(mongo::OperationContext*, int, char const*) at /data/mci/src/src/mongo/db/catalog/cursor_manager.cpp:299
mongo::receivedKillCursors(mongo::OperationContext*, mongo::Message&) at /data/mci/src/src/mongo/db/instance.cpp:637
mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) at /data/mci/src/src/mongo/db/instance.cpp:550
~basic_string at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:539
 (inlined by) ~HostAndPort at /data/mci/src/src/mongo/util/net/hostandport.h:49
 (inlined by) mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*) at /data/mci/src/src/mongo/db/db.cpp:177
mongo::PortMessageServer::handleIncomingMsg(void*) at /data/mci/src/src/mongo/util/net/message_server_port.cpp:231
?? ??:0
?? ??:0

Comment by Šimun Mikecin [ 03/Jun/16 ]

SELinux is disabled:

  1. sestatus
    SELinux status: disabled
Comment by Ramon Fernandez Marina [ 03/Jun/16 ]

Thanks for your report simun.mikecin@me.com. Two things:

  • You should consider upgrading your arbiter to 3.2.6 as soon as possible.
  • Are you running SELinux on the affected node? What's the output of sestatus?

Looking at the backtrace now.

Comment by Ramon Fernandez Marina [ 03/Jun/16 ]

Here's the backtrace from the logs:

2016-06-02T16:37:21.793+0200 F -        [conn172829] Invalid access at address: 0x20
2016-06-02T16:37:21.870+0200 F -        [conn172829] Got signal: 11 (Segmentation fault).
 
 0x13336a2 0x13325d9 0x1332958 0x7f108a6fc710 0xd149b1 0xbbe83a 0xa018a9 0xa14615 0xa148cb 0xa14cd1 0xd64b20 0xd64c41 0xdce1c6 0xdce275 0xdce363 0xb62670 0xb62731 0xe2eecf 0xafd473 0xadfee3 0xae059c 0xae081c 0xcc6561 0xccd763 0x99ad3c 0x12df01d 0x7f108a6f49d1 0x7f108a441b6d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F336A2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F325D9"},{"b":"400000","o":"F32958"},{"b":"7F108A6ED000","o":"F710"},{"b":"400000","o":"9149B1","s":"_ZNK5mongo16OperationContext9getClientEv"},{"b":"400000","o":"7BE83A","s":"_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs"},{"b":"400000","o":"6018A9","s":"_ZN5mongo12DBClientBase10killCursorEx"},{"b":"400000","o":"614615","s":"_ZN5mongo14DBClientCursor4killEv"},{"b":"400000","o":"6148CB","s":"_ZN5mongo14DBClientCursorD1Ev"},{"b":"400000","o":"614CD1","s":"_ZN5mongo14DBClientCursorD0Ev"},{"b":"400000","o":"964B20","s":"_ZN5mongo20DocumentSourceLookUpD1Ev"},{"b":"400000","o":"964C41","s":"_ZN5mongo20DocumentSourceLookUpD0Ev"},{"b":"400000","o":"9CE1C6","s":"_ZNSt5dequeIN5boost13intrusive_ptrIN5mongo14DocumentSourceEEESaIS4_EE19_M_destroy_data_auxESt15_Deque_iteratorIS4_RS4_PS4_ESA_"},{"b":"400000","o":"9CE275","s":"_ZNSt5dequeIN5boost13intrusive_ptrIN5mongo14DocumentSourceEEESaIS4_EED1Ev"},{"b":"400000","o":"9CE363","s":"_ZN5mongo8PipelineD0Ev"},{"b":"400000","o":"762670","s":"_ZN5mongo18PipelineProxyStageD1Ev"},{"b":"400000","o":"762731","s":"_ZN5mongo18PipelineProxyStageD0Ev"},{"b":"400000","o":"A2EECF","s":"_ZN5mongo12PlanExecutorD1Ev"},{"b":"400000","o":"6FD473","s":"_ZN5mongo12ClientCursorD1Ev"},{"b":"400000","o":"6DFEE3","s":"_ZN5mongo13CursorManager11eraseCursorEPNS_16OperationContextExb"},{"b":"400000","o":"6E059C","s":"_ZN5mongo19GlobalCursorIdCache11eraseCursorEPNS_16OperationContextExb"},{"b":"400000","o":"6E081C","s":"_ZN5mongo13CursorManager29eraseCursorGlobalIfAuthorizedEPNS_16OperationContextEiPKc"},{"b":"400000","o":"8C6561","s":"_ZN5mongo19receivedKillCursorsEPNS_16OperationContextERNS_7MessageE"},{"b":"400000","o":"8CD763","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"59AD3C","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EDF01D","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F108A6ED000","o":"79D1"},{"b":"7F108A359000","o":"E8B6D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.6", "gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.el6.x86_64", "version" : "#1 SMP Fri Nov 22 03:15:09 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F353F83EC51F1F6EB9AD4C20A41389253348AA5B" }, { "b" : "7FFF546FF000", "elfType" : 3, "buildId" : "81A81BE2E44C93640ADEDB62ADC93A47F4A09DD1" }, { "b" : "7F108B590000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "318EAB33420B000D542F09B91B716BACAB1AD546" }, { "b" : "7F108B1B0000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "3A8D65B9A373C0AFAF106F3A979835B16DBEFF1A" }, { "b" : "7F108AFA8000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "B26528BF6C0636AC1CAE5AC50BDBC07E60851DF4" }, { "b" : "7F108ADA4000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "AFC7448F2F2F6ED4E5BC82B1BD8A7320B84A9D48" }, { "b" : "7F108AB20000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "98B028A725D6E93253F25DF00B794DFAA66A3145" }, { "b" : "7F108A90A000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A44499D29B114A5366CD72DD4883958495AC1C1D" }, { "b" : "7F108A6ED000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "1BB4E10307D6B94223749CFDF2AD14C365972C60" }, { "b" : "7F108A359000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "F1A1C0575F0EC141A157E5DFA4525E70BD27B62E" }, { "b" : "7F108B7FC000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "57BF668F99B7F5917B8D55FBB645173C9A644575" }, { "b" : "7F108A115000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "EFF68B7DE77D081BC4A0CB38FE9DCBC60541BF92" }, { "b" : "7F1089E2F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "95EBB74C2C0A1E1714344036145A0239FFA4892D" }, { "b" : "7F1089C2B000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "b" : "7F10899FF000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D02E7D3149950118009A81997434E28B7D9EC9B2" }, { "b" : "7F10897E9000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "b" : "7F10895DE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "5AFCBEA0D62EE0335714CCBAB7BA808E2A16028C" }, { "b" : "7F10893DB000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "b" : "7F10891C1000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "56843351EFB2CE304A7E4BD0754991613E9EC8BD" }, { "b" : "7F1088FA2000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x13336a2]
 mongod(+0xF325D9) [0x13325d9]
 mongod(+0xF32958) [0x1332958]
 libpthread.so.0(+0xF710) [0x7f108a6fc710]
 mongod(_ZNK5mongo16OperationContext9getClientEv+0x1) [0xd149b1]
 mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0x2A) [0xbbe83a]
 mongod(_ZN5mongo12DBClientBase10killCursorEx+0x129) [0xa018a9]
 mongod(_ZN5mongo14DBClientCursor4killEv+0x65) [0xa14615]
 mongod(_ZN5mongo14DBClientCursorD1Ev+0x2B) [0xa148cb]
 mongod(_ZN5mongo14DBClientCursorD0Ev+0x11) [0xa14cd1]
 mongod(_ZN5mongo20DocumentSourceLookUpD1Ev+0x50) [0xd64b20]
 mongod(_ZN5mongo20DocumentSourceLookUpD0Ev+0x11) [0xd64c41]
 mongod(_ZNSt5dequeIN5boost13intrusive_ptrIN5mongo14DocumentSourceEEESaIS4_EE19_M_destroy_data_auxESt15_Deque_iteratorIS4_RS4_PS4_ESA_+0xB6) [0xdce1c6]
 mongod(_ZNSt5dequeIN5boost13intrusive_ptrIN5mongo14DocumentSourceEEESaIS4_EED1Ev+0x65) [0xdce275]
 mongod(_ZN5mongo8PipelineD0Ev+0x43) [0xdce363]
 mongod(_ZN5mongo18PipelineProxyStageD1Ev+0xC0) [0xb62670]
 mongod(_ZN5mongo18PipelineProxyStageD0Ev+0x11) [0xb62731]
 mongod(_ZN5mongo12PlanExecutorD1Ev+0x7F) [0xe2eecf]
 mongod(_ZN5mongo12ClientCursorD1Ev+0x63) [0xafd473]
 mongod(_ZN5mongo13CursorManager11eraseCursorEPNS_16OperationContextExb+0x1E3) [0xadfee3]
 mongod(_ZN5mongo19GlobalCursorIdCache11eraseCursorEPNS_16OperationContextExb+0x40C) [0xae059c]
 mongod(_ZN5mongo13CursorManager29eraseCursorGlobalIfAuthorizedEPNS_16OperationContextEiPKc+0x3C) [0xae081c]
 mongod(_ZN5mongo19receivedKillCursorsEPNS_16OperationContextERNS_7MessageE+0x191) [0xcc6561]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xF83) [0xccd763]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x99ad3c]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x26D) [0x12df01d]
 libpthread.so.0(+0x79D1) [0x7f108a6f49d1]
 libc.so.6(clone+0x6D) [0x7f108a441b6d]
-----  END BACKTRACE  -----

Generated at Thu Feb 08 04:06:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.