[SERVER-15620] repl_coordinator_impl_test fails with segfault on RHEL5.5 with gcc 4.8.2 Created: 13/Oct/14  Updated: 11/Jul/16  Resolved: 13/Oct/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.7.8

Type: Bug Priority: Major - P3
Reporter: Jonathan Reams Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

The repl_coordinator_impl_test unit test fails sporadically with a SEGFAULT, and also sometimes hangs forever when compiled with gcc 4.8.2 on RHEL 5.5 on a c3.8xlarge ec2 instance.

The output from the test failure is included below

role
[2014/10/13 12:12:53.739] 2014-10-13T16:12:53.733+0000 I REPLSETS transition to PRIMARY
[2014/10/13 12:12:53.739] 2014-10-13T16:12:53.733+0000 I REPLSETS transition to SECONDARY
[2014/10/13 12:12:53.739] 2014-10-13T16:12:53.733+0000 F -        Invalid access at address: 0xc
[2014/10/13 12:12:53.740] 2014-10-13T16:12:53.735+0000 F -        Got signal: 11 (Segmentation fault).
[2014/10/13 12:12:53.740]  0x8269f37 0x82697da 0x8269ba6 0xffffe600 0xf7db5d2d 0x81c053c 0x81ca4f9 0x81fb189 0x81f4bb0 0x81faea1 0x81f7d48 0x81ca56b 0x82bb2f4 0xf7db3912 0xf7d264ae
[2014/10/13 12:12:53.740] ----- BEGIN BACKTRACE -----
[2014/10/13 12:12:53.742] {"backtrace":[{"b":"8048000","o":"221F37"},{"b":"8048000","o":"2217DA"},{"b":"8048000","o":"221BA6"},{"b":"FFFFE000","o":"600"},{"b":"F7DAE000","o":"7D2D"},{"b":"8048000","o":"17853C"},{"b":"8048000","o":"1824F9"},{"b":"8048000","o":"1B3189"},{"b":"8048000","o":"1ACBB0"},{"b":"8048000","o":"1B2EA1"},{"b":"8048000","o":"1AFD48"},{"b":"8048000","o":"18256B"},{"b":"8048000","o":"2732F4"},{"b":"F7DAE000","o":"5912"},{"b":"F7C51000","o":"D54AE"}],"processInfo":{ "mongodbVersion" : "2.7.8-pre-", "gitVersion" : "04881187a924504df7e0de339c5adeeffae9371d", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "8048000" }, { "elfType" : 3 }, { "b" : "F7EEB000", "path" : "/lib/librt.so.1", "elfType" : 3 }, { "b" : "F7EE6000", "path" : "/lib/libdl.so.2", "elfType" : 3 }, { "b" : "F7DFD000", "path" : "/usr/lib/libstdc++.so.6", "elfType" : 3 }, { "b" : "F7DD4000", "path" : "/lib/libm.so.6", "elfType" : 3 }, { "b" : "F7DC8000", "path" : "/lib/libgcc_s.so.1", "elfType" : 3 }, { "b" : "F7DAE000", "path" : "/lib/libpthread.so.0", "elfType" : 3 }, { "b" : "F7C51000", "path" : "/lib/libc.so.6", "elfType" : 3 }, { "b" : "F7F01000", "path" : "/lib/ld-linux.so.2", "elfType" : 3 } ] }}
[2014/10/13 12:12:53.742]  repl_coordinator_impl_test(_ZN5mongo15printStackTraceERSo+0x37) [0x8269f37]
[2014/10/13 12:12:53.742]  repl_coordinator_impl_test(+0x2217DA) [0x82697da]
[2014/10/13 12:12:53.742]  repl_coordinator_impl_test(+0x221BA6) [0x8269ba6]
[2014/10/13 12:12:53.742]  (__kernel_rt_sigreturn+0x0) [0xffffe600]
[2014/10/13 12:12:53.742]  libpthread.so.0(pthread_mutex_lock+0x1D) [0xf7db5d2d]
[2014/10/13 12:12:53.743]  repl_coordinator_impl_test(_ZN5mongo4repl26ReplicationCoordinatorImpl15_stepDownFinishERKNS0_19ReplicationExecutor12CallbackDataERKNS_6Date_tEPNS_6StatusE+0x1BC) [0x81c053c]
[2014/10/13 12:12:53.743]  repl_coordinator_impl_test(_ZNSt17_Function_handlerIFvRKN5mongo4repl19ReplicationExecutor12CallbackDataEESt5_BindIFSt7_Mem_fnIMNS1_26ReplicationCoordinatorImplEFvS5_RKNS0_6Date_tEPNS0_6StatusEEEPS9_St12_PlaceholderILi1EESA_SE_EEE9_M_invokeERKSt9_Any_dataS5_+0x39) [0x81ca4f9]
[2014/10/13 12:12:53.743]  repl_coordinator_impl_test(_ZNSt17_Function_handlerIFvvESt5_BindIFSt8functionIFvRKN5mongo4repl19ReplicationExecutor12CallbackDataEEES6_EEE9_M_invokeERKSt9_Any_data+0x29) [0x81fb189]
[2014/10/13 12:12:53.743]  repl_coordinator_impl_test(+0x1ACBB0) [0x81f4bb0]
[2014/10/13 12:12:53.744]  repl_coordinator_impl_test(_ZNSt17_Function_handlerIFvvESt5_BindIFPFvRKSt8functionIS0_EES3_EEE9_M_invokeERKSt9_Any_data+0x11) [0x81faea1]
[2014/10/13 12:12:53.744]  repl_coordinator_impl_test(_ZN5mongo4repl19ReplicationExecutor3runEv+0x4B8) [0x81f7d48]
[2014/10/13 12:12:53.744]  repl_coordinator_impl_test(_ZN5boost6detail11thread_dataISt5_BindIFSt7_Mem_fnIMN5mongo4repl19ReplicationExecutorEFvvEEPS6_EEE3runEv+0x2B) [0x81ca56b]
[2014/10/13 12:12:53.744]  repl_coordinator_impl_test(+0x2732F4) [0x82bb2f4]
[2014/10/13 12:12:53.744]  libpthread.so.0(+0x5912) [0xf7db3912]
[2014/10/13 12:12:53.744]  libc.so.6(clone+0x5E) [0xf7d264ae]
[2014/10/13 12:12:53.744] -----  END BACKTRACE  -----
[2014/10/13 12:12:53.744]                   116.1499 ms



 Comments   
Comment by Githook User [ 13/Oct/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-15620 Must hold ReplicationCoordinatorImpl::_mutex while scanning the waiter list.
Branch: master
https://github.com/mongodb/mongo/commit/d5625da51a529303702ef834f18d3c94ad70aa5f

Generated at Thu Feb 08 03:38:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.