[SERVER-75069] Reapplying an insert donor state doc op (for split, merge and tenant migration) on secondaries due to WCE can cause node crash . Created: 20/Mar/23  Updated: 27/Mar/23  Resolved: 27/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Didier Nadeau
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-74966 Improve serverless lock/access blocke... Closed
Related
is related to SERVER-74966 Improve serverless lock/access blocke... Closed
Assigned Teams:
Serverless
Operating System: ALL
Sprint: Server Serverless 2023-04-03
Participants:

 Description   

I found that tenant_migration_donor_op_observer.cpp (shared by MTM protocol and shard merge) doesn’t register onRollback hook  (to release Servereless lock and uninstall mtab) on secondaries with the following argument.

            // onRollback is not registered on secondaries since secondaries should not fail to
            // apply the write.

And,  that’s completely not true. On secondaries, if the WriteUnitOfWork(WUOW) fails with WriteConflictException (WCE), we would retry the WUOW (see here). In case of other failures, secondaries don’t retry the WOUW, instead we would crash the server.

 

So, if the secondaries  retry applying the insert state doc oplog entry due to WCE,  it  will try to reacquire the Serverless lock and would crash the server due to this invariant failure.  The fix will be to register onRollback() hook even for secondaries. This is a problem for MTM (multi-tenant migration) , merge and split.



 Comments   
Comment by Didier Nadeau [ 27/Mar/23 ]

Work for this will be done in SERVER-74966 as it already concerns the rollback for serverless lock.

Generated at Thu Feb 08 06:29:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.