-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
RPL A (10/09/15), Repl B (10/30/15)
While waiting for secondaries to catch up during a step down request, the primary seems to be sending out heartbeats constantly to the secondaries. The step down command should be restarting the heartbeats once and allow the replication coordinator to reschedule new heartbeats every "heartbeatIntervalMillis" ms. This bug seems to have been introduced by SERVER-20671.
----------
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.703-0400 d20010| 2015-10-07T16:57:49.703-0400 I COMMAND [conn8] command admin.$cmd command: replSetStepDown { replSetStepDown: 60.0, secondaryCatchUpPeriodSecs: 60.0 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:150 locks:{ Global: { acquireCount: { r: 1, R: 1 } } } protocol:op_command 61264ms [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.703-0400 I COMMAND [conn7] command admin.$cmd command: isMaster { ismaster: 1.0 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:488 locks:{} protocol:op_command 1226ms [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.703-0400 I REPL [replExecDBWorker-2] transition to SECONDARY [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK [conn7] end connection 127.0.0.1:61013 (6 connections now open) [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK [conn8] end connection 127.0.0.1:61032 (6 connections now open) [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK [conn11] end connection 208.52.191.216:49333 (6 connections now open) [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK [conn14] end connection 208.52.191.216:49367 (5 connections now open) [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| { [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| "ok" : 0, [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| "errmsg" : "By the time we were ready to step down, we were already past the time we were supposed to step down until", [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| "code" : 50 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| } ... [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 2015-10-07T16:57:57.682-0400 E QUERY [thread1] Error: [0] != [0] are equal : expected replSetStepDown to close the shell's connection : [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 doassert@src/mongo/shell/assert.js:15:14 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 assert.neq@src/mongo/shell/assert.js:119:5 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 @jstests/replsets/stepdown_long_wait_time.js:95:5 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 @jstests/replsets/stepdown_long_wait_time.js:10:2 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 [js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 failed to load: jstests/replsets/stepdown_long_wait_time.js
- is related to
-
SERVER-20671 step down should resend heartbeats if secondaries are not caught up
- Closed
- related to
-
SERVER-20964 convert stepdown_killop.js to use fail point instead of bridging
- Closed