[SERVER-16748] consecutive replSetReconfig calls can trigger an invariant failure Created: 06/Jan/15  Updated: 15/Jan/15  Resolved: 07/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.8.0-rc4
Fix Version/s: 2.8.0-rc5

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

	
 m31102| 2015-01-06T21:18:02.855+0000 I -        [ReplicationExecutor] Invariant failure !_heartbeatReconfigThread.get() src/mongo/db/repl/repl_coordinator_impl_heartbeat.cpp 333
 m31102| 2015-01-06T21:18:02.890+0000 I CONTROL  [ReplicationExecutor] 
 m31102|  0x8b5f077 0x8b00097 0x8ae2bfa 0x88564f6 0x8856719 0x8856c6a 0x8857ba1 0x8877215 0x887c747 0x887c9f9 0x8876780 0x887c711 0x88786db 0x884f71b 0x8bb0964 0xfd6939 0x1e77ae
 m31102| ----- BEGIN BACKTRACE -----
 m31102| {"backtrace":[{"b":"8048000","o":"B17077"},{"b":"8048000","o":"AB8097"},{"b":"8048000","o":"A9ABFA"},{"b":"8048000","o":"80E4F6"},{"b":"8048000","o":"80E719"},{"b":"8048000","o":"80EC6A"},{"b":"8048000","o":"80FBA1"},{"b":"8048000","o":"82F215"},{"b":"8048000","o":"834747"},{"b":"8048000","o":"8349F9"},{"b":"8048000","o":"82E780"},{"b":"8048000","o":"834711"},{"b":"8048000","o":"8306DB"},{"b":"8048000","o":"80771B"},{"b":"8048000","o":"B68964"},{"b":"FD1000","o":"5939"},{"b":"110000","o":"D77AE"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc5-pre-", "gitVersion" : "9804789bbba304cb0649d7874ef0bb390f536b46", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:08:06 EDT 2010", "machine" : "i686" }, "somap" : [ { "elfType" : 2, "b" : "8048000" }, { "b" : "325000", "elfType" : 3 }, { "b" : "D33000", "path" : "/lib/i686/nosegneg/librt.so.1", "elfType" : 3 }, { "b" : "69E000", "path" : "/lib/libdl.so.2", "elfType" : 3 }, { "b" : "887000", "path" : "/usr/lib/libstdc++.so.6", "elfType" : 3 }, { "b" : "D69000", "path" : "/lib/i686/nosegneg/libm.so.6", "elfType" : 3 }, { "b" : "2E9000", "path" : "/lib/libgcc_s.so.1", "elfType" : 3 }, { "b" : "FD1000", "path" : "/lib/i686/nosegneg/libpthread.so.0", "elfType" : 3 }, { "b" : "110000", "path" : "/lib/i686/nosegneg/libc.so.6", "elfType" : 3 }, { "b" : "C6F000", "path" : "/lib/ld-linux.so.2", "elfType" : 3 } ] }}
 m31102|  mongod(_ZN5mongo15printStackTraceERSo+0x37) [0x8b5f077]
 m31102|  mongod(_ZN5mongo10logContextEPKc+0x107) [0x8b00097]
 m31102|  mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xCA) [0x8ae2bfa]
 m31102|  mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl26_scheduleHeartbeatReconfigERKNS0_16ReplicaSetConfigE+0x6F6) [0x88564f6]
 m31102|  mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl30_handleHeartbeatResponseActionERKNS0_23HeartbeatResponseActionERKNS_10StatusWithINS0_24ReplSetHeartbeatResponseEEE+0x169) [0x8856719]
 m31102|  mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl24_handleHeartbeatResponseERKNS0_19ReplicationExecutor25RemoteCommandCallbackDataEi+0x3FA) [0x8856c6a]
 m31102|  mongod(_ZNSt17_Function_handlerIFvRKN5mongo4repl19ReplicationExecutor25RemoteCommandCallbackDataEESt5_BindIFSt7_Mem_fnIMNS1_26ReplicationCoordinatorImplEFvS5_iEEPS9_St12_PlaceholderILi1EEiEEE9_M_invokeERKSt9_Any_dataS5_+0x31) [0x8857ba1]
 m31102|  mongod(+0x82F215) [0x8877215]
 m31102|  mongod(_ZNSt17_Function_handlerIFvRKN5mongo4repl19ReplicationExecutor12CallbackDataEESt5_BindIFPFvS5_RKSt8functionIFvRKNS2_25RemoteCommandCallbackDataEEERKNS2_20RemoteCommandRequestERKNS0_10StatusWithINS2_21RemoteCommandResponseEEEESt12_PlaceholderILi1EESD_SG_SL_EEE9_M_invokeERKSt9_Any_dataS5_+0x27) [0x887c747]
 m31102|  mongod(_ZNSt17_Function_handlerIFvvESt5_BindIFSt8functionIFvRKN5mongo4repl19ReplicationExecutor12CallbackDataEEES6_EEE9_M_invokeERKSt9_Any_data+0x29) [0x887c9f9]
 m31102|  mongod(+0x82E780) [0x8876780]
 m31102|  mongod(_ZNSt17_Function_handlerIFvvESt5_BindIFPFvRKSt8functionIS0_EES3_EEE9_M_invokeERKSt9_Any_data+0x11) [0x887c711]
 m31102|  mongod(_ZN5mongo4repl19ReplicationExecutor3runEv+0x4AB) [0x88786db]
 m31102|  mongod(_ZN5boost6detail11thread_dataISt5_BindIFSt7_Mem_fnIMN5mongo4repl19ReplicationExecutorEFvvEEPS6_EEE3runEv+0x2B) [0x884f71b]
 m31102|  mongod(+0xB68964) [0x8bb0964]
 m31102|  libpthread.so.0(+0x5939) [0xfd6939]
 m31102|  libc.so.6(clone+0x5E) [0x1e77ae]
 m31102| -----  END BACKTRACE  -----
 m31102| 2015-01-06T21:18:02.890+0000 I -        [ReplicationExecutor] 
 m31102| 
 m31102| ***aborting after invariant() failure



 Comments   
Comment by Githook User [ 07/Jan/15 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-16748 Don't allow a new heartbeat reconfig to start until we've cleared all state from the previous one
Branch: master
https://github.com/mongodb/mongo/commit/3a6620898c354d62829adc0009a5e65ef858594d

Comment by Andy Schwerin [ 06/Jan/15 ]

There is a race between resetting _heartbeatReconfigThread at the end of the heartbeat reconfig store thread and the executor finishing the reconfig process and changing the config state from HBReconfiging to Steady. Before scheduling the reconfig-finish work, the reconfig store thread should re-lock its lock, so that _heartbeatReconfigThread will be reset before the config state switches back to Steady. I believe putting the statement lk.lock() at line 441 should be sufficient to fix.

Generated at Thu Feb 08 03:42:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.