[SERVER-47582] Stepdown in drain mode can trigger deadlock due to clearing OplogTruncateAfterPoint Created: 16/Apr/20 Updated: 29/Oct/23 Resolved: 06/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc5, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||
| Sprint: | Execution Team 2020-05-04, Execution Team 2020-05-18 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Found the deadlock when testing a behavior in drain mode. Stepdown command hangs. Attached are the log and waiting graph.
Here's the stepdown thread's stacktrace.
It's unclear why this isn't caught by existing stepdown in drain mode tests, like jstests/replsets/step_down_during_draining.js and its friends, probably because the burn-in tests in my patch build runs much more times than master branch. My new test timed out 3 time in burn-in tests. The logic was introduced in |
| Comments |
| Comment by Githook User [ 08/May/20 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: (cherry picked from commit da4a8e0e85627d6febd8bce3fd87f221e0ff97c6) |
| Comment by Githook User [ 06/May/20 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: |
| Comment by Suganthi Mani [ 01/May/20 ] |
|
I also agree to judah.schvimer's point of view as I am also not sure how safe for onStepDownHook to run with ShouldNotConflictWithSecondaryBatchApplicationBlock which might requires investigation. Also, since execution team have plans to remove the PBWM lock, I don't think it's worth additional effort. |
| Comment by Judah Schvimer [ 01/May/20 ] |
|
I lean towards the more targeted solution. We'd like to remove the PBWM in |
| Comment by Dianna Hohensee (Inactive) [ 01/May/20 ] |
|
judah.schvimer suganthi.mani How would you like to proceed? If you guys would like a more global solution for stepdown either never taking the PBWM by places a ShouldNotConflictWithSecondaryBatchApplicationBlock at a high level of the stepdown code, or preemptively taking the PBWM in intent mode, then I need some guidance – or I can hand the ticket back. |
| Comment by Suganthi Mani [ 30/Apr/20 ] |
|
It also makes me to think that we should run the entire onStepDownHook under ShouldNotConflictWithSecondaryBatchApplicationBlock and not just this. This prevents any kind of future deadlocks like this. I am also thinking of making stepdown thread to first acquire PBWM lock in some intent mode and then acquire RSTL lock in X mode. The problem with this will be, we won't be able to step down the node while the node is draining. I am not sure whether the draining step blocking step down will be acceptable. |
| Comment by Suganthi Mani [ 30/Apr/20 ] |
|
dianna.hohensee proposal sounds good to me too. Here is the basic problem, the stepdown thread currently violates the locking order and that lead to the deadlock. Basically, the locking order is To my understanding, PBWM doesn't guarantee any exclusive access. The main purpose of PBWM, I think, is to prevent readers from reading the hole data while the oplog application is in progress. RSTL lock should be good enough to guarantee the exclusive access. |
| Comment by Siyuan Zhou [ 30/Apr/20 ] |
|
dianna.hohensee, the proposal sounds good to me. However, if I remembered correctly truncateAfterPoint is also used on secondary in oplog application. If stepdown doesn't take PBWM, how do we guarantee the exclusive access? I'm curious about Suganthi's thoughts too. |
| Comment by Judah Schvimer [ 30/Apr/20 ] |
|
suganthi.mani, I think you're the resident expert on the stepdown locking. Does the above sound good to you? |
| Comment by Dianna Hohensee (Inactive) [ 30/Apr/20 ] |
|
It sounds like the cycle is OplogApplier -> writers -> stepdown -> OplogApplier. Stepdown wants the PBWM that OplogApplier has. So I'll break the cycle by having stepdown not take the PBWM. Specifically, the write to clear the oplogTruncateAfterPoint will skip taking the PBWM, using the existing ShouldNotConflictWithSecondaryBatchApplicationBlock RAII type. |
| Comment by Tess Avitabile (Inactive) [ 16/Apr/20 ] |
|
Thanks, milkie! |
| Comment by Eric Milkie [ 16/Apr/20 ] |
|
I think this problem actually existed before |