[SERVER-20964] convert stepdown_killop.js to use fail point instead of bridging Created: 25/Aug/15  Updated: 25/Jan/17  Resolved: 16/Oct/15

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 3.2.0-rc1

Type: Improvement Priority: Minor - P4
Reporter: Benety Goh Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-20832 step down command should restart hear... Closed
Backwards Compatibility: Fully Compatible
Sprint: RPL A (10/09/15), Repl B (10/30/15)
Participants:

 Description   

On certain platforms, stepdown_killop.js fails while isolating the secondary node due to instability of the mongobridge tool on that platform.

This test does not require the use of mongobridge. Replacing its use with a fail point that suspends oplog application will still allow us to test the desired functionality (killing a step down operation while the primary is waiting for secondaries to catch up) while making the test run a lot faster.

------

e07c554c6a OS X 10.8 DEBUG replicasets_WT

https://evergreen.mongodb.com/task/mongodb_mongo_master_osx_108_debug_replicasets_WT_e07c554c6a5fa62bb1cfc8dfa3f3c02759674637_15_08_24_19_42_20

(1st execution)

https://logkeeper.mongodb.org/build/55db933d90413011a28aee7b/test/55dba4a490413011a28b6f6c

	
[js_test:stepdown_killop] 2015-08-24T19:12:57.272-0400  m31000| 2015-08-24T19:12:57.270-0400 W NETWORK  [ReplExecNetThread-0] Failed to connect to 208.52.190.128:31005, reason: errno:61 Connection refused
[js_test:stepdown_killop] 2015-08-24T19:12:57.273-0400  m31000| 2015-08-24T19:12:57.271-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to mci-osx108-2.build.10gen.cc:31005; HostUnreachable couldn't connect to server mci-osx108-2.build.10gen.cc:31005, connection attempt failed
[js_test:stepdown_killop] 2015-08-24T19:12:57.273-0400  m31000| 2015-08-24T19:12:57.271-0400 W NETWORK  [ReplExecNetThread-5] Failed to connect to 208.52.190.128:31004, reason: errno:61 Connection refused
[js_test:stepdown_killop] 2015-08-24T19:12:57.273-0400  m31000| 2015-08-24T19:12:57.271-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to mci-osx108-2.build.10gen.cc:31004; HostUnreachable couldn't connect to server mci-osx108-2.build.10gen.cc:31004, connection attempt failed
[js_test:stepdown_killop] 2015-08-24T19:12:57.273-0400  m31000| 2015-08-24T19:12:57.271-0400 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
[js_test:stepdown_killop] 2015-08-24T19:12:57.273-0400  m31000| 2015-08-24T19:12:57.271-0400 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
[js_test:stepdown_killop] 2015-08-24T19:12:57.274-0400  m31000| 2015-08-24T19:12:57.271-0400 I REPL     [replExecDBWorker-0] transition to SECONDARY



 Comments   
Comment by Benety Goh [ 16/Oct/15 ]

A similar enhancement was applied to stepdown_long_wait_time.js in SERVER-20832

Comment by Githook User [ 16/Oct/15 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-20964 use failpoint instead of bridging to prevent secondary from catching with primary during step down
Branch: master
https://github.com/mongodb/mongo/commit/469814c1c12cbe1f7f511c5d7c952b909378a3c0

Comment by Siyuan Zhou [ 15/Oct/15 ]

Reopening this ticket as this happened again on OS X 10.8 DEBUG.
Task
Log

Another recent occurrence is on Oct 13.
Task
Log

Comment by Benety Goh [ 15/Oct/15 ]

https://evergreen.mongodb.com/task_history/mongodb-mongo-master/replicasets_legacy?revision=f9478edc15d451a2c4b2606c02d7fdd5066d711e#/stepdown_killop.js=fail&buildVariants=

Comment by Charlie Swanson [ 11/Sep/15 ]

Happened again here

Comment by Jonathan Reams [ 27/Aug/15 ]

This happened again here https://evergreen.mongodb.com/task/mongodb_mongo_master_osx_108_debug_replicasets_3765aa134fce806f3e146e870734922d75632e94_15_08_27_13_29_26

Comment by Benety Goh [ 26/Aug/15 ]

stepdown_killop,js fails intermittently during the setup phase on certain platforms due to instability of the mongobridge tool. Bridging is not necessary for testing the desired functionality (killing an active step down operation). A fail point that suspends oplog application would work much better.


5d76623abc OS X 10.8 DEBUG replicasets_WT

https://evergreen.mongodb.com/task/mongodb_mongo_master_osx_108_debug_replicasets_WT_5d76623abc23f453c6530b1f8543476c5d65c4e9_15_08_26_04_49_10

https://logkeeper.mongodb.org/build/55dd4dd2be07c47abf9365ab/test/55dd5fafbe07c47abf93832f

[js_test:stepdown_killop] 2015-08-26T02:44:31.638-0400 		{
[js_test:stepdown_killop] 2015-08-26T02:44:31.638-0400 			"_id" : 1,
[js_test:stepdown_killop] 2015-08-26T02:44:31.638-0400 			"name" : "mci-osx108-7.build.10gen.cc:31004",
[js_test:stepdown_killop] 2015-08-26T02:44:31.638-0400 			"health" : 0,
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"state" : 8,
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"stateStr" : "(not reachable/healthy)",
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"uptime" : 0,
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"optime" : Timestamp(0, 0),
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"lastHeartbeat" : ISODate("2015-08-26T06:44:30.573Z"),
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"lastHeartbeatRecv" : ISODate("2015-08-26T06:43:14.849Z"),
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"pingMs" : NumberLong(145),
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"lastHeartbeatMessage" : "couldn't connect to server mci-osx108-7.build.10gen.cc:31004, connection attempt failed",
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 			"configVersion" : -1
[js_test:stepdown_killop] 2015-08-26T02:44:31.639-0400 		},
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 		{
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"_id" : 2,
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"name" : "mci-osx108-7.build.10gen.cc:31005",
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"health" : 0,
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"state" : 8,
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"stateStr" : "(not reachable/healthy)",
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"uptime" : 0,
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"lastHeartbeat" : ISODate("2015-08-26T06:44:30.573Z"),
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"lastHeartbeatRecv" : ISODate("2015-08-26T06:44:30.187Z"),
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 			"configVersion" : -1
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 		}
[js_test:stepdown_killop] 2015-08-26T02:44:31.640-0400 	],
[js_test:stepdown_killop] 2015-08-26T02:44:31.641-0400 	"ok" : 1
[js_test:stepdown_killop] 2015-08-26T02:44:31.641-0400 }
[js_test:stepdown_killop] 2015-08-26T02:44:31.641-0400 Status for : mci-osx108-7.build.10gen.cc:31000, checking mci-osx108-7.build.10gen.cc:31000/mci-osx108-7.build.10gen.cc:31000
[js_test:stepdown_killop] 2015-08-26T02:44:31.641-0400 Status  : 2  target state : 1
...
[js_test:stepdown_killop] 2015-08-26T02:44:34.578-0400  m31000| 2015-08-26T02:44:34.577-0400 I REPL     [ReplicationExecutor] Error in heartbeat request to mci-osx108-7.build.10gen.cc:31004; HostUnreachable couldn't connect to server mci-osx108-7.build.10gen.cc:31004, connection attempt failed
[js_test:stepdown_killop] 2015-08-26T02:44:35.710-0400 assert.soon failed, msg:waiting for state indicator state for 60000ms
[js_test:stepdown_killop] 2015-08-26T02:44:35.711-0400 doassert@src/mongo/shell/assert.js:11:14
[js_test:stepdown_killop] 2015-08-26T02:44:35.711-0400 assert.soon@src/mongo/shell/assert.js:189:13
[js_test:stepdown_killop] 2015-08-26T02:44:35.711-0400 ReplSetTest.prototype.waitForIndicator@src/mongo/shell/replsettest.js:994:1
[js_test:stepdown_killop] 2015-08-26T02:44:35.711-0400 ReplSetTest.prototype.waitForState@src/mongo/shell/replsettest.js:951:5
[js_test:stepdown_killop] 2015-08-26T02:44:35.711-0400 @jstests/replsets/stepdown_killop.js:28:5
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 @jstests/replsets/stepdown_killop.js:11:2
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 2015-08-26T02:44:35.710-0400 E QUERY    [thread1] Error: assert.soon failed, msg:waiting for state indicator state for 60000ms :
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 doassert@src/mongo/shell/assert.js:11:14
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 assert.soon@src/mongo/shell/assert.js:189:13
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 ReplSetTest.prototype.waitForIndicator@src/mongo/shell/replsettest.js:994:1
[js_test:stepdown_killop] 2015-08-26T02:44:35.712-0400 ReplSetTest.prototype.waitForState@src/mongo/shell/replsettest.js:951:5
[js_test:stepdown_killop] 2015-08-26T02:44:35.713-0400 @jstests/replsets/stepdown_killop.js:28:5
[js_test:stepdown_killop] 2015-08-26T02:44:35.713-0400 @jstests/replsets/stepdown_killop.js:11:2
[js_test:stepdown_killop] 2015-08-26T02:44:35.713-0400 
[js_test:stepdown_killop] 2015-08-26T02:44:35.713-0400 failed to load: jstests/replsets/stepdown_killop.js

Generated at Thu Feb 08 03:55:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.