[SERVER-40498] Writing transaction oplog entries must not take locks while holding an oplog slot Created: 05/Apr/19  Updated: 29/Oct/23  Resolved: 09/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Bug Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-40522 OplogSlotReserver must take a global ... Closed
Related
related to SERVER-40522 OplogSlotReserver must take a global ... Closed
is related to SERVER-35367 Hold locks in fewer callers of waitFo... Closed
is related to SERVER-36534 Don't acquire locks on oplog when wri... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20
Participants:
Linked BF Score: 30

 Description   

It is illegal to acquire new locks while holding open an oplog slot, since it can cause the deadlock described in SERVER-35367. We do this in a few places when writing transaction oplog entries:

When preparing a transaction, we reserve oplog slots here or here, then stash the locker here or here, then take a global lock here or here.

When aborting a prepared transaction, we reserve an oplog slot here, then release locks here, then take a global lock here.

When committing a prepared transaction, we reserve an oplog slot here, then release locks here, then take a global lock here.

(Note that the global lock acquisition for writing oplog entries for unprepared transaction commit is not a problem, since the locker is already holding the global lock.)



 Comments   
Comment by Githook User [ 09/May/19 ]

Author:

{'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou', 'username': 'visualzhou'}

Message: SERVER-40498 Writing transaction oplog entries must not take locks while holding an oplog slot.
Branch: master
https://github.com/mongodb/mongo/commit/fa878904816b984e00d980336755f762f91919c3

Comment by Githook User [ 04/May/19 ]

Author:

{'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou', 'username': 'visualzhou'}

Message: SERVER-40498 Side transaction keeps the same locker
Branch: master
https://github.com/mongodb/mongo/commit/4fb71c5a1c79b745ef56d53a8264ef5fdd202dda

Comment by Siyuan Zhou [ 04/May/19 ]

tess.avitabile, you're right. Never mind, I was fooled by the two-phase locking. The final patch is in code review now.

Comment by Tess Avitabile (Inactive) [ 01/May/19 ]

I tried a similar approach to acquire a global lock before reserving the oplog slot as in your patch, but even if we do so, the lock will be released before calling OpObserver.

I don't understand why the lock will be released before calling OpObserver. If the lock is still in scope, it should not be released when the WUOW is committed. The only locks that get released due to two-phase locking when the WUOW is committed are the locks that have gone out of scope.

I would also be interested to talk about SERVER-40791 to get a better understanding of that problem.

Comment by Siyuan Zhou [ 01/May/19 ]

tess.avitabile and judah.schvimer, I managed to use the same lock in side transaction and to fix for preparing a transaction. However, I found another issue with commitPreparedTransaction. As found in SERVER-39926, the transaction is committed to storage and its locks are dropped before the transaction commit op observer runs with a new WUOW. I tried a similar approach to acquire a global lock before reserving the oplog slot as in your patch, but even if we do so, the lock will be released before calling OpObserver.

I can think of two ways to solve this. The solutions are also considered in SERVER-40791.

We could move onPreparedTransactionCommit() to before committing the storage transaction. Since we don't allow storage transaction commit to fail, both data WT transaction and oplog WT transaction will commit. To make sure the oplog write isn't visible before the data change so that afterCluterTime read are respected, we need to reserve an OplogSlot without using it and release it after both data and oplog writes commit as we will do in SERVER-40870.

Alternatively, we could release the recovery unit and the locker from the WUOW and manage them manually on prepared commit, so that we only commit the recovery unit in _commitStorageTransaction() instead of committing the WUOW and have the locker outlive both data and oplog writes. We'll need storage team to weigh in on this solution.

Comment by Siyuan Zhou [ 19/Apr/19 ]

Assigning this to me as Tess is on vacation next week.

Comment by Tess Avitabile (Inactive) [ 08/Apr/19 ]

Before we reserve the oplog slot, we will manually make a Locker and take a global IX lock. We will pass this Locker into onTransactionPrepare(). In onTransactionPrepare(), we will pass the Locker to SideTransactionBlock, which will use it to populate the opCtx after it has stashed the current locker.

Comment by Tess Avitabile (Inactive) [ 05/Apr/19 ]

milkie, geert.bosch, I am having trouble fixing the first case, for preparing a transaction. The prepare thread is already holding a global lock. However, we stash the transaction resources, so the OperationContext's current locker is not holding a global lock. The thread must not take a global lock because it is holding an oplog slot. But it is required to have a global lock when writing to the oplog through this dassert. I'm thinking about a few different solutions:

  • Remove the dassert. That seems like a bad idea.
  • Stash the transaction resources without stashing the locker. I don't think that will work, since when we commit the WUOW for the oplog write, we will incorrectly release our locks.
  • Expose stashed lockers through the OperationContext somehow. When writing to the oplog, we can check that the OperationContext's current locker or any stashed locker has a global lock.

Do you have any suggestions?

Generated at Thu Feb 08 04:55:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.