[SERVER-27285] Add a jsCore passthrough with a replset that periodically SIGKILLs a secondary Created: 05/Dec/16  Updated: 07/Sep/17  Resolved: 01/Feb/17

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.4.5, 3.5.3

Type: Task Priority: Major - P3
Reporter: Mathias Stearn Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Gantt Dependency
has to be done after SERVER-26741 "Fatal Assertion 16360" triggered by ... Closed
Related
is related to SERVER-26016 Improve testing for Steady State Oplo... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4, v3.2
Sprint: TIG 2017-01-02, TIG 2017-02-13
Participants:

 Description   

This work is being split out of SERVER-26016.

Like our existing replica set passthroughs, but randomly sends kill -9 to the (non-voting) secondary. It should be possible to start the secondary without connectivity to the primary (using mongobridge) and have it reach the SECONDARY state quickly. When connectivity is restored, it should be able to catch up to the primary and pass our repl validation tests.

Optional: It may be worth starting without "--replSet" before it replays the oplog to test the invariants (begin <= minValid <= oplogDeletePoint <= top of oplog). Any NULL items should be removed from the comparison, but transitive orderings still hold (minvalid <= top of oplog, even if oplogDeletePoint is null).

Optional: It may be worth pausing replication for most of the run, then unpausing before killing to maximize the probability of killing when not idle. (see clean_shutdown_oplog_state.js for an example of this technique)



 Comments   
Comment by Githook User [ 01/May/17 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-27285 Run jsCore tests while periodically killing secondaries.

Adds a new replica_sets_kill_secondaries_jscore_passthrough.yml suite
that after running tests for a certain period of time (defaults to 30
seconds), resmoke.py will send a SIGKILL to all of the replica set's
secondaries. Each node is then restarted individually with the primary
disabled to verify it reaches the SECONDARY state within 5 minutes of
starting up.

(cherry picked from commit 07f5d153305c0bf10ef55b5dc73eb9a2ca8cb104)
(cherry picked from commit e02c3c769bbcbe26d9132caf28cad6d2d2b4766a)

Also includes the remainder of the changes from
068878410614c789f23b2abc6c5b9680c82abe5e to rename
core_small_oplog_rs_kill_secondaries.yml to
replica_sets_kill_secondaries_jscore_passthrough.yml.
Branch: v3.4
https://github.com/mongodb/mongo/commit/2d7be840ecf1b7928a99def51fe3bea8304738f8

Comment by Judah Schvimer [ 02/Mar/17 ]

Please change the task name to replica_sets_kill_secondaries_jscore_passthrough in the 3.4 backport per SERVER-27995.

Comment by Githook User [ 01/Feb/17 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-27285 Fix exception handling in PeriodicKillSecondaries.

The exception needs a name in order to access its 'args' attribute.
Branch: master
https://github.com/mongodb/mongo/commit/e02c3c769bbcbe26d9132caf28cad6d2d2b4766a

Comment by Max Hirschhorn [ 01/Feb/17 ]

Re-opening this ticket to address an undefined err variable in the exception handling code. I noticed this while going through the test and task logs from my patch build with robert.guo as part of the "assertion extraction" project. pylint would have caught this issue and may be useful to integrate into our other linting practices.

$ pylint -E buildscripts/resmokelib/testing/hooks.py
No config file found, using default configuration
************* Module buildscripts.resmokelib.testing.hooks
E:484,39: Undefined variable 'err' (undefined-variable)

Comment by Githook User [ 31/Jan/17 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-27285 Run jsCore tests while periodically killing secondaries.

Adds a new core_small_oplog_rs_kill_secondaries.yml suite that after
running tests for a certain period of time (defaults to 30 seconds),
resmoke.py will send a SIGKILL to all of the replica set's secondaries.
Each node is then restarted individually with the primary disabled to
verify it reaches the SECONDARY state within 5 minutes of starting up.
Branch: master
https://github.com/mongodb/mongo/commit/07f5d153305c0bf10ef55b5dc73eb9a2ca8cb104

Generated at Thu Feb 08 04:14:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.