[SERVER-60509] onReplicationRollback should crash on failure Created: 06/Oct/21  Updated: 29/Oct/23  Resolved: 26/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Repl 2021-10-18, Repl 2021-11-01
Participants:
Linked BF Score: 40

 Description   

Currently, we trigger rollback op observers after recovering to the stableTS, and resetting the lastApplied/lastDurable optimes. However, it's possible that we fail in the op observers, failing rollback but not resetting the lastFetchedOpTime. The consequence is that we end up retrying rollback but hanging indefinitely since Replication will first wait for the lastApplied to reach the lastFetchedOpTime before starting rollback. In this case, we wait to apply an oplog entry that no longe exists.

The server will only fail rollback and crash if the error returned is a UnrecoverableRollbackError and retry rollback otherwise. We tend to use rollback op observers to clean up on-disk state (as demonstrated in the linked BF), so if the procedure fails, we should instead crash on rollback failure instead of retrying.



 Comments   
Comment by Githook User [ 26/Oct/21 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-60509 onReplicationRollback should crash on failure
Branch: master
https://github.com/mongodb/mongo/commit/f47995bdb089be4beb4d0bb4e277f097df064087

Comment by Githook User [ 26/Oct/21 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-60509 OnReplicationRollback should crash on failure
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/7a6c90046ce85dcccb5a2e46ca702fd8eca22ef6

Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-60509 make OpObserver::OnReplicationRollback noexcept
Branch: master
https://github.com/mongodb/mongo/commit/2e4ce610a2c6d42a2cd137d1bf08d8c547fdfd3a

Comment by Githook User [ 19/Oct/21 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-60509 Mark OpObserver::OnReplicationRollback as noexcept
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/d18f0045bd5fc36094a826deec550d5866425867

Generated at Thu Feb 08 05:49:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.