[SERVER-41035] Rollback should kill all user operations before taking RSTL lock in X. Created: 07/May/19  Updated: 29/Oct/23  Resolved: 17/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-12710 Docs for SERVER-41035: Rollback shoul... Closed
Related
related to SERVER-41216 Rename InterruptedDueToStepDown error... Closed
is related to SERVER-37574 Force reconfig should kill user opera... Closed
is related to SERVER-40594 Range deleter in prepare conflict ret... Closed
is related to SERVER-40700 Deadlock between read prepare conflic... Closed
is related to SERVER-40641 Ensure TTL delete in prepare conflict... Closed
is related to SERVER-41037 Stepup should kill all user operation... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-05-20, Repl 2019-06-03
Participants:

 Description   

Currently, there is a  3 way deadlock. Assume that we are transitioning from secondary to rollback.

  • Thread A (Read thread)  acquires RSTL lock in IX and  blocked by a prepared txn due to prepare conflict.
  • Rollback enqueues RSTL lock in X mode. And blocked behind A.
  • Prepared txn won’t be able commit until we transition out of rollback.

Alternatively, thread A might have got blocked on prepared transaction due to conflicting DB/ collection lock. For eg. dbhash cmd.

EDIT: Blocked due to conflicting mongoDB locks  is not possible as secondary prepared txn yield mongoDB locks.



 Comments   
Comment by Judah Schvimer [ 20/May/19 ]

SERVER-41216 is complete so we now kill operations during rollback with InterruptedDueToReplStateChange.

Comment by Judah Schvimer [ 17/May/19 ]

For documentation and drivers: we now kill all user operations on entering rollback before we close connections. We currently do so with the code NotMasterOrSecondary but we plan to change that in SERVER-41216 to InterruptedDueToReplStateChange. Both of these are retryable error codes already so this should have limited downstream impact.

Comment by Githook User [ 17/May/19 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-41035 Rollback should kill all user operations before taking RSTL lock in X
Branch: master
https://github.com/mongodb/mongo/commit/1d1a7182c70a4c13782af9a60067cac5008ca3c6

Generated at Thu Feb 08 04:56:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.