[SERVER-44612] recoverFromOplogAsStandalone with takeUnstableCheckpointOnShutdown should succeed if retried after a successful attempt Created: 13/Nov/19  Updated: 29/Oct/23  Resolved: 03/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.2.2, 4.3.3

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Repl 2019-12-02, Repl 2019-12-16
Participants:
Case:

 Description   

If you retry recoverFromOplogAsStandalone with takeUnstableCheckpointOnShutdown after running the parameter combination successfully, it should do nothing and let you shut down the node cleanly, taking another unstable checkpoint on shutdown that should really be a noop since we haven't done any work and the node already has an up-to-date unstable checkpoint. Thus the proposal is:

If a node starts up with both recoverFromOplogAsStandalone and takeUnstableCheckpointOnShutdown, and the storage engine does not have a stable checkpoint, we will check if all of the replication metadata indicates that the data files contain a fully up-to-date unstable checkpoint. If so we will go into read-only mode without doing any replication recovery (since it should not be needed) and allow automation to shutdown the node at its leisure like normal. Otherwise, if the replication metadata doesn't indicate that the unstable checkpoint is a safe one requiring no replication recovery, we will fassert as we do today. This would make it idempotent in the "success" case.



 Comments   
Comment by Phil Jordan [ 06/Dec/19 ]

Hey judah.schvimer, myself and josef.ahmad had a review of this, LGTM'ed.

Comment by Louisa Berger [ 04/Dec/19 ]

judah.schvimer Looks successful from our end, thank you!

Comment by Louisa Berger [ 03/Dec/19 ]

Filed CLOUDP-53832 to verify that the behavior is as expected. Thank you judah.schvimer!

Comment by Judah Schvimer [ 03/Dec/19 ]

Are you able to get a build from this evergreen run? The artifact will be in the "compile" task on the variant you need.

Comment by Louisa Berger [ 03/Dec/19 ]

judah.schvimer Do you have a build that we can test with for 4.2?

Comment by Githook User [ 02/Dec/19 ]

Author:

{'email': 'judah.schvimer@10gen.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'}

Message: SERVER-44612 recoverFromOplogAsStandalone with takeUnstableCheckpointOnShutdown should succeed if retried after a successful attempt

(cherry picked from commit d90cdf5eb5f01b93ba7fecc11001dbeb6b040bb8)
Branch: master
https://github.com/mongodb/mongo/commit/f4398b4afbaeaa388f6a7360949af006c632df07

Comment by Githook User [ 02/Dec/19 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah.schvimer@10gen.com'}

Message: SERVER-44612 recoverFromOplogAsStandalone with takeUnstableCheckpointOnShutdown should succeed if retried after a successful attempt
Branch: v4.2
https://github.com/mongodb/mongo/commit/d90cdf5eb5f01b93ba7fecc11001dbeb6b040bb8

Generated at Thu Feb 08 05:06:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.