[SERVER-34155] Add clean shutdowns to kill_secondaries and kill_primaries passthroughs Created: 27/Mar/18  Updated: 29/Oct/23  Resolved: 17/May/18

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Robert Guo (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-34150 Create a passthrough that does clean ... Closed
is related to SERVER-33287 Create passthrough that kills the pri... Closed
Backwards Compatibility: Fully Compatible
Sprint: TIG 2018-05-07, TIG 2018-05-21
Participants:
Story Points: 2

 Description   

Clean shutdowns leave the server in a different state then unclean shutdowns with respect to recover to a stable timestamp and are interesting by themselves. We do not have a lot of coverage around clean shutdowns and replication.



 Comments   
Comment by Githook User [ 17/May/18 ]

Author:

{'email': 'robert.guo@10gen.com', 'username': 'guoyr', 'name': 'Robert Guo'}

Message: SERVER-34155 add clean shutdown primary passthrough
Branch: master
https://github.com/mongodb/mongo/commit/c87d73cd446e14a1b7779752824604196d61f609

Comment by Judah Schvimer [ 16/Apr/18 ]

Thanks! SERVER-34150 is to test fastcount with passthroughs that only do clean shutdowns. This passthrough proposal was to test the interaction of clean and unclean shutdown.

Comment by Samyukta Lanka [ 16/Apr/18 ]

No, at the moment it only does unclean shutdowns.

Also wanted to note that the kill_primaries suite also explicitly excludes tests that use fast count and other commands that use the WiredTiger size storer.

Comment by Judah Schvimer [ 16/Apr/18 ]

samy.lanka and max.hirschhorn, does the kill_primaries hook ever do a clean shutdown followed by a startup with the data files intact? I think that is the only work left to do on this ticket.

Comment by Judah Schvimer [ 02/Apr/18 ]

The kill_secondaries hook does a clean shutdown followed by a replica set start up with data files intact here: https://github.com/mongodb/mongo/blob/b64b512409dc84bd093d7266d5fc201177f85915/buildscripts/resmokelib/testing/hooks/periodic_kill_secondaries.py#L185-L194, so if kill_primaries hook in SERVER-33287 does the same for primaries, this ticket can become "Gone Away".

Comment by Judah Schvimer [ 29/Mar/18 ]

One goal of SERVER-34150 was to be able to test that fast count is correct across restarts. If this ticket added clean restarts to the kill suites, then we wouldn't gain that coverage because the kill suites can't expect fast count to be correct, but it seems like the kill suites already do some clean restarts.

Comment by Max Hirschhorn [ 29/Mar/18 ]

judah.schvimer, I imagined doing this ticket in a similar manner to SERVER-33287 with the difference being that we call primary.mongod.stop(kill=false) rather than primary.mongod.stop(kill=true). This means that we'd be running with retryable writes enabled and a writeConcern of w="majority". Is there a different case of clean shutdown that you'd want to cover in SERVER-34150?

Comment by Judah Schvimer [ 29/Mar/18 ]

I did not mean for SERVER-34150 to be for the rollback fuzzer specifically. SERVER-33587 was more aimed at the rollback fuzzer.

Comment by Max Hirschhorn [ 29/Mar/18 ]

Would that be a duplicate of SERVER-34150?

Maybe? It isn't clear to me if you meant for SERVER-34150 to be specific to the RollbackTest fixture or not.

Comment by Judah Schvimer [ 27/Mar/18 ]

Would that be a duplicate of SERVER-34150?

Comment by Max Hirschhorn [ 27/Mar/18 ]

I'm definitely interested in the kill_primaries hook to also have clean shutdowns that restart the node with data files intact, in addition to hard kills, but that's under development so maybe this ticket is a "works as designed".

How about we repurpose this ticket to add a clean shutdown primary version of the stepdown suite like we're doing with the kill primary version?

Comment by Judah Schvimer [ 27/Mar/18 ]

I guess we do already restart the fixture with data files intact here. My original thought was to alternate clean and unclean shutdowns randomly, though I guess we're doing both after every 30 seconds anyways now that I look closer. I'm definitely interested in the kill_primaries hook to also have clean shutdowns that restart the node with data files intact, in addition to hard kills, but that's under development so maybe this ticket is a "works as designed".

Comment by Max Hirschhorn [ 27/Mar/18 ]

judah.schvimer, could you elaborate on when you'd want the secondary to be cleanly shut down in the replica_sets_kill_secondaries_jscore_passthrough.yml test suite? My understanding is that sending a SIGKILL at this point in the hook is done to try and kill the secondary part-way into applying a batch.

Generated at Thu Feb 08 04:35:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.