[SERVER-25666] Add invariants to ensure proper primary transition state changes Created: 17/Aug/16  Updated: 25/Jan/17  Resolved: 26/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.3.15

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Repl 2016-09-19, Repl 2016-10-10
Participants:

 Description   

enum PrimaryTransitionState {
Secondary,
CatchingUp,
WaitingForDrainToBecomePrimary,
WaitingForDrainButStaySecondary,
Primary,
}

If a stepdown occurs in CatchingUp, we'd go straight back to Secondary. If it
happens in WaitingForDrainToBecomePrimary, we'd go to
WaitingForDrainButStaySecondary.



 Comments   
Comment by Githook User [ 26/Sep/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-25666 add invariants to ensure proper primary transition state changes

This reverts commit 803d0a0d56690ebb11dead6beb541517a944ef4a.
Branch: master
https://github.com/mongodb/mongo/commit/be9da942f797a9255f00eaff83f66e9f58d8b6ff

Comment by Siyuan Zhou [ 08/Sep/16 ]

When we scheduled futures works but are about to change the assumptions of those future works (e.g. we're stepping down), there are 3 ways to clean up the states.
1) Store the callback handle somewhere when scheduling it and cancel the jobs on transition. Whoever cancels the jobs cleans up the states left by the jobs.
2) Store the callback handle somewhere when scheduling it and cancel the jobs on transition. Let the cancelled job clean up the states by itself.
3) Don't store and cancel the callback, let it run and overlap with unexpected states, but the callback will discover the assumption (e.g. we're still primary) has changed and give up eventually.

We use all of three patterns in the codebase. Option 1 is the approach proposed by this ticket, but unfortunately we go with option 3 in several places. Freshness scan for catching up is one of them and election callbacks is another. Cleaning up the states asynchronously makes such invariance check impossible, so the rule becomes whenever the states (e.g. _isCatchingUp) are used and assumptions are made, its caller is responsible for taking care of its states. Adding invariant() in callbacks guarantees that.

We can come up with a better design and consolidate the code some day.

Comment by Eric Milkie [ 08/Sep/16 ]

We need to clean up some state transition for cleanup mode first.

Comment by Githook User [ 08/Sep/16 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: Revert "SERVER-25666 add invariants to ensure proper primary transition state changes"

This reverts commit 2ef0fbc8fe0c1a483080d2ddcdcd35dd94093487.
Branch: master
https://github.com/mongodb/mongo/commit/803d0a0d56690ebb11dead6beb541517a944ef4a

Comment by Githook User [ 07/Sep/16 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-25666 add invariants to ensure proper primary transition state changes
Branch: master
https://github.com/mongodb/mongo/commit/2ef0fbc8fe0c1a483080d2ddcdcd35dd94093487

Comment by Eric Milkie [ 07/Sep/16 ]

The different boolean variables have different synchronization rules; in particular, we want to avoid locking a mutex when reading _canAcceptNonLocalWrites. Therefore, I'm going to explore adding the invariants to ensure that when we write to any of these booleans, the other boolean values are falsey.

Comment by Eric Milkie [ 24/Aug/16 ]

Understood.

Comment by Spencer Brody (Inactive) [ 24/Aug/16 ]

milkie, my lgtm on https://mongodbcr.appspot.com/90310001/ was predicated on this being done. If we don't do this for 3.4 then I'd really like us to at least add the invariants that I suggested to redbeard0531 on that CR.

Generated at Thu Feb 08 04:09:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.