Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.0-rc0
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
RPL A (10/09/15), Repl B (10/30/15)
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

While waiting for secondaries to catch up during a step down request, the primary seems to be sending out heartbeats constantly to the secondaries. The step down command should be restarting the heartbeats once and allow the replication coordinator to reschedule new heartbeats every "heartbeatIntervalMillis" ms. This bug seems to have been introduced by ~~SERVER-20671~~.

----------

Task
Logs

[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.703-0400 d20010| 2015-10-07T16:57:49.703-0400 I COMMAND  [conn8] command admin.$cmd command: replSetStepDown { replSetStepDown: 60.0, secondaryCatchUpPeriodSecs: 60.0 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:150 locks:{ Global: { acquireCount: { r: 1, R: 1 } } } protocol:op_command 61264ms
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.703-0400 I COMMAND  [conn7] command admin.$cmd command: isMaster { ismaster: 1.0 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:488 locks:{} protocol:op_command 1226ms
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.703-0400 I REPL     [replExecDBWorker-2] transition to SECONDARY
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.704-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK  [conn7] end connection 127.0.0.1:61013 (6 connections now open)
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK  [conn8] end connection 127.0.0.1:61032 (6 connections now open)
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK  [conn11] end connection 208.52.191.216:49333 (6 connections now open)
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 d20010| 2015-10-07T16:57:49.704-0400 I NETWORK  [conn14] end connection 208.52.191.216:49367 (5 connections now open)
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| {
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277|   "ok" : 0,
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277|   "errmsg" : "By the time we were ready to step down, we were already past the time we were supposed to step down until",
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277|   "code" : 50
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:49.705-0400 sh20277| }
...
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 2015-10-07T16:57:57.682-0400 E QUERY    [thread1] Error: [0] != [0] are equal : expected replSetStepDown to close the shell's connection :
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 doassert@src/mongo/shell/assert.js:15:14
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 assert.neq@src/mongo/shell/assert.js:119:5
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 @jstests/replsets/stepdown_long_wait_time.js:95:5
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 @jstests/replsets/stepdown_long_wait_time.js:10:2
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400
[js_test:stepdown_long_wait_time] 2015-10-07T16:57:57.685-0400 failed to load: jstests/replsets/stepdown_long_wait_time.js

is related to

SERVER-20671 step down should resend heartbeats if secondaries are not caught up

Closed

related to

SERVER-20964 convert stepdown_killop.js to use fail point instead of bridging

Closed

Assignee:: Benety Goh
Reporter:: Max Hirschhorn
Participants:: Benety Goh, Githook User, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Oct 07 2015 10:18:23 PM UTC
Updated:: Jan 25 2017 09:59:53 PM UTC
Resolved:: Oct 14 2015 02:12:30 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates