Details
-
Task
-
Resolution: Won't Do
-
Major - P3
-
None
Description
Description
In case this is not documented: if you freshly upgraded to 4.0 but need to roll back, you may be unable to complete the rollback. In this case, you must downgrade the binary version to 3.6 to let the rollback finish, after which you may upgrade again.
Description of Linked Ticket
In a replica set with all nodes on v4.0 binary version and in FCV=3.6, a clean shutdown will cause a node to set its recovery timestamp to 0. If this happens for a node whose oplog has diverged (i.e. needs to enter rollback), this node won't be able to complete the rollback since it does not have a stable timestamp to roll back to which is needed for recover-to-timestamp. Furthermore, in order to take a new stable checkpoint, it would have to commit a new majority write, which it shouldn't be able to do until it completes the rollback. It also shouldn't be able to upgrade to FCV=4.0 until the node can completes the rollback and replicate new log entries from the primary. If FCV=3.6 and we encounter this situation, falling back on the rollbackViaFetch algorithm may be the appropriate solution. Another alternative may be to always use rollbackViaRefetch whenever FCV=3.6.
Scope of changes
Impact to Other Docs
MVP (Work and Date)
Resources (Scope or Design Docs, Invision, etc.)
Attachments
Issue Links
- documents
-
SERVER-40954 Error message for UnrecoverableRollbackError in FCV 3.6 should recommend downgrading to 3.6
-
- Closed
-