[SERVER-27160] Narrow race at startup between RSSync setting RECOVERING and BGSync setting ROLLBACK state Created: 22/Nov/16 Updated: 06/Dec/17 Resolved: 13/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.13 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Siyuan Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Repl 2017-10-02, Repl 2017-10-23 | ||||||||
| Participants: | |||||||||
| Description |
|
At startup, the RSSync thread is responsible for transitioning the node from STARTUP2 to RECOVERING. At the same time the BGSync thread may decide that rollback is necessary and try to transition to ROLLBACK. If BGSync wins the race and we go into ROLLBACK first, RSSync can then transition us to RECOVERING while rollback is still running. If this happens before the rollback process sets minValid, it can cause RSSync to go live as SECONDARY. In the right kind of network partition this could theoretically lead to us running and being elected PRIMARY. While I don't see any synchronization that would actively prevent this case, it seems fairly unlikely to happen in practice because it would require BGSync to complete several network round trips before RSSync is able to do the small amount of work it does before setting RECOVERING. |
| Comments |
| Comment by Githook User [ 13/Oct/17 ] |
|
Author: {'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou', 'username': 'visualzhou'}Message: |