[SERVER-27160] Narrow race at startup between RSSync setting RECOVERING and BGSync setting ROLLBACK state Created: 22/Nov/16  Updated: 06/Dec/17  Resolved: 13/Oct/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.13

Type: Bug Priority: Major - P3
Reporter: Mathias Stearn Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-27982 Unnecessary loop in RSDataSync Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-10-02, Repl 2017-10-23
Participants:

 Description   

At startup, the RSSync thread is responsible for transitioning the node from STARTUP2 to RECOVERING. At the same time the BGSync thread may decide that rollback is necessary and try to transition to ROLLBACK. If BGSync wins the race and we go into ROLLBACK first, RSSync can then transition us to RECOVERING while rollback is still running. If this happens before the rollback process sets minValid, it can cause RSSync to go live as SECONDARY. In the right kind of network partition this could theoretically lead to us running and being elected PRIMARY.

While I don't see any synchronization that would actively prevent this case, it seems fairly unlikely to happen in practice because it would require BGSync to complete several network round trips before RSSync is able to do the small amount of work it does before setting RECOVERING.



 Comments   
Comment by Githook User [ 13/Oct/17 ]

Author:

{'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou', 'username': 'visualzhou'}

Message: SERVER-27160 Narrow race at startup between RSSync setting RECOVERING and BGSync setting ROLLBACK state
Branch: master
https://github.com/mongodb/mongo/commit/cbafd0521cd94a26ce790787d7d220c707132a48

Generated at Thu Feb 08 04:14:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.