[SERVER-42843] shutting down a node while in PRIMARY state will flap writability flag Created: 16/Aug/19 Updated: 27/Oct/23 Resolved: 19/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Backlog - Replication Team |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Participants: | |||||||||
| Description |
|
When a node is shut down, either via a signal like SIGINT or via the shutdown command, the shutdown proceedings will attempt a stepdown. This stepdown attempt will toggle the canAcceptNonLocalWrites flag without actually changing the state of the node, and it will also do this while acquiring and releasing the RSTL. This breaks some assumptions for code that is using the canAcceptNonLocalWrites flag as an indication of PRIMARY versus SECONDARY state when performing unreplicated writes such as index builds. You don't even need a successful shutdown for this situation to occur. Here is a log from a one-node replica set after I ran the shutdown command from the shell; it fails because I didn't use the force flag:
The issue is that between setting the canAcceptWrites flag to 0 and setting it to 1, the RSTL is unlocked and locked again, thus allowing other operations the ability to lock the database and observe a state where the node is PRIMARY and yet canAcceptWrites is false. |
| Comments |
| Comment by Siyuan Zhou [ 20/Aug/19 ] |
|
Thanks Eric. Discussed offline. Index build is special because it can do unreplicated writes sometimes, which needs to know the state of replication to assign a right timestamp. Simultaneous index build will solve this issue by making all writes replicated. |
| Comment by Eric Milkie [ 19/Aug/19 ] |
The exact assumption I mentioned in the text, "using the canAcceptNonLocalWrites flag as an indication of PRIMARY versus SECONDARY state". This ticket is describing a situation where my code can lock the RSTL, check canAcceptNonLocalWrites, see that it is false, and yet have the node be in state PRIMARY (and continue to be in state PRIMARY indefinitely afterwards). |
| Comment by Siyuan Zhou [ 19/Aug/19 ] |
I believe shutdown will change its state to Secondary if the conditional stepdown succeeds.
The contract is that holding RSTL guarantees that canAcceptNonLocalWrites doesn't change. One RSTL is released, the node is free to change its state. What assumptions are you referring to here? |
| Comment by Eric Milkie [ 19/Aug/19 ] |
|
If this is something we'd like to simplify in the future, is there a ticket to indicate that? I was expecting to be able to use this ticket for that work but it's now closed. |
| Comment by Eric Milkie [ 16/Aug/19 ] |
|
I filed a separate ticket (linked) to fix the BF here; you can either put this ticket in your backlog to do something about eventually or just resolve it for now. |
| Comment by Judah Schvimer [ 16/Aug/19 ] |
|
Assigning back to Eric after a discussion about a different solution. While this behavior is not the easiest contract, and something we'd like to simplify in the future, it is not necessarily a bug in the canAcceptNonLocalWrites flag. |
| Comment by Eric Milkie [ 16/Aug/19 ] |
|
The indexing code is doing it to determine what timestamp to use when doing the initial catalog write to start an index build. It's either the timestamp associated with the oplog entry if on a secondary, or a timestamp generated by attempting to write a no-op oplog entry, or no timestamp if we can't do that because we're in initial sync or startup recovery. |
| Comment by Judah Schvimer [ 16/Aug/19 ] |
|
What code is checking PRIMARY vs. SECONDARY to make a decision? I'm suspicious of that code as well. |