[SERVER-25160] Drain and catchup modes shouldn't continue on stepdown Created: 19/Jul/16 Updated: 25/Jan/17 Resolved: 25/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.12 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||||||||||
| Sprint: | Repl 18 (08/05/16), Repl 2016-08-29 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
If the primary steps down in drain mode, it should stop the drain mode and clean up its state. |
| Comments |
| Comment by Githook User [ 25/Aug/16 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: |
| Comment by Siyuan Zhou [ 15/Aug/16 ] |
|
Step-down could happen in two places: 1) when the primary is scanning freshness on nodes; 2) when the primary is trying to catch up. The first case has been covered and unit tested. The second case will be fixed as part of The solution is to check primary-ship when finishing drain mode. In 3.3, step-down will signal the replication coordinator to finish catch-up, then the replication coordinator finishes catch-up and enters drain mode as normal. Bgsync sees the drain mode and puts a sentinel in the oplog buffer to let applier exit drain mode. Finally, applier calls signalDrainComplete() and notices it's no longer the primary, so it cleans up the states and stops transition to primary.
Currently, when a primary steps down in drain mode, it will still try to finish the transition to primary and allow external writes. However, after the node writes the no-op "new primary" into oplog, bgsync will notice the diversity from its sync source and trigger rollback, disabling external writes, so the effect of this bug is limited. |