[SERVER-37199] Yield locks of transactions in secondary application Created: 19/Sep/18 Updated: 29/Oct/23 Resolved: 03/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.6 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-10-08, Repl 2018-10-22, Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 62 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Secondary application tends to acquire locks conservatively, for example, all commands acquire global write lock. This will conflict with prepared transactions. We can yield locks of transactions on secondary since the oplog should include no conflicting operations due to the concurrency control on primary. An alternative solution is to have secondaries acquire the same locks as the primary, but yielding locks will also fix other issues, e.g. |
| Comments |
| Comment by Githook User [ 03/Dec/18 ] |
|
Author: {'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}Message: |
| Comment by Judah Schvimer [ 15/Nov/18 ] |
|
I would rather yield locks and reacquire them. It makes me less nervous about accidentally allowing in readers without generating prepare conflicts. That said, I agree solution (1) would work. I don't follow how parallel application would be made more difficult by yielding locks though. |
| Comment by Siyuan Zhou [ 15/Nov/18 ] |
|
judah.schvimer, I see the point of recovering locks for step-up. There’re two solutions: 1) drop locks for prepared transaction on secondary, then abort and reapply them on step-up. 2) yield locks for prepared transactions on secondary, then resume the locks on step-up. The first one is easier on secondary application, but harder on step-up. The second solution has more work to do on secondary application, but less on step-up. When we have beyond 16MB transactions, yielding and restoring locks will be needed for all transactional operations in solution #2, while solution #1 makes things simpler by dropping the locks. milkie and I thought solution #1 was easier without considering the state transition. Now I'm leaning towards solution #2 due to performance concerns of solution #1 on step-up and its complexity of reapplication. However, if secondary application is going to differ from primary's behavior further when we apply transactions in parallel (e.g they stash ops in different ways), it might be more straightforward to re-apply the ops rather than recovering them on state transitions. |
| Comment by Judah Schvimer [ 14/Nov/18 ] |
|
Prepared transactions will need to recover their locks on step up while the RSTL is held. To make step-up writes (mainly dropping temporary collections) not conflict with prepared transactions, we should recover the prepared transaction locks at the very end of the step up, right before releasing the RSTL. |
| Comment by Geert Bosch [ 06/Nov/18 ] |
|
siyuan.zhou, I confirmed for listCollections. We still have a few commands that take MODE_S locks, such as dbStats and dbHash as Tess mentions. We're planning to get rid of these as well. |
| Comment by Tess Avitabile (Inactive) [ 06/Nov/18 ] |
|
We still use S mode for dbhash here. However, I don't think it's important that prepared transactions conflict with dbhash on secondaries. |
| Comment by Siyuan Zhou [ 06/Nov/18 ] |
|
Discussed an alternative solution with geert.bosch: we may yield locks after transactions get prepared on secondaries, so that other commands don't conflict with them, because the concurrency control on primary guarantees there's no conflict if the ops are applied in the oplog order. One concern from Judah was the behavioral change of operations that hold locks in S mode on secondaries, which conflict with prepared transactions' IX locks. According to geert.bosch, we've removed all S mode locks on master. For example, listCollections needs a DB lock in IS mode and then each collection lock in IS mode, rather than a DB lock in S mode. However, to support the yielding behavior, we probably need to change WriteUnitOfWork to introduce a new prepared state for a recovery unit. WriteUnitOfWork is a RAII type designed to represent the two-phase locking, yielding violates that semantics. That being said, we'll keep investigating the original solution. After all, it makes the system easier to reason about if operations on secondaries acquire the same locks as on the primary. |
| Comment by Siyuan Zhou [ 19/Sep/18 ] |
|
We should also audit the applyOps command's oplog entry path to make sure there's no ways to generate oplog entries that lock Global during oplog application. |