[SERVER-52564] Deadlock between step down and MongoDOperationContextSession Created: 02/Nov/20 Updated: 29/Oct/23 Resolved: 04/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0, 4.2.14, 4.4.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergi Mateo Bellido | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-csrs-stepdown-also, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-08 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||||||||||||||||||
| Description |
|
There is a deadlock between the thread that is running the process of stepping down and the session catalog migration producer. More concretely: The thread holding the RSTL lock on version 4.4 might have a stacktrace like the following:
The other thread's stacktrace might be different depending on the operation, however, there will be a chunk migration thread on the session migration step (most likely on the SessionCatalogMigrationDestination class). |
| Comments |
| Comment by Randolph Tan [ 17/May/21 ] |
|
Branch: v4.2 https://github.com/mongodb/mongo/commit/c2295adab43675bfde8c9b2aa5795d9b7fccb6b0 |
| Comment by Githook User [ 23/Apr/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit 6ee5a25cfc951f6e914dcc9f7d1a63d2e7aeaa67) |
| Comment by Githook User [ 04/Feb/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: |
| Comment by Sergi Mateo Bellido [ 11/Jan/21 ] |
|
marcos.grillo renctan I found another BF that failed because of this issue: BF-19805 (the main difference is that the second thread, the one that is trying to get the RSTL lock, is trying to migrate some data). |