[SERVER-77018] Deadlock between dbStats and 2 index builds Created: 10/May/23 Updated: 29/Oct/23 Resolved: 17/May/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 7.0.0-rc0, 6.3.1 |
| Fix Version/s: | 7.1.0-rc0, 6.3.2, 6.0.7, 5.0.19, 7.0.0-rc2 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Fausto Leyva (Inactive) | Assignee: | Yujin Kang Park |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.3, v6.0, v5.0
|
||||||||||||||||||||||||||||
| Sprint: | Execution Team 2023-05-29 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
If an on-going index build yields its locks after initiating a bulk insert (which is initialized here), it still holds onto the write lock on the index table at the WiredTiger level. If a dbStats command comes in, it will take collection level MODE_IS lock and attempt to acquire a read_lock for the ident the index build is currently writing to (but cannot since IndexBuild_1 holds the exclusive lock on that ident). (In (collection_impl.cpp) we iterate through the unfinished indexes and that is how we can see the in-progress index table). The problem arises when another operation comes in and prevents IndexBuild_1 from re-acquiring its lock, like another index build that enqueues a collection MODE_X lock. These events can produce a deadlock in the system represented by:
Original explanation by Suganthi here |
| Comments |
| Comment by Githook User [ 13/Jun/23 ] |
|
Author: {'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}Message: |
| Comment by Githook User [ 13/Jun/23 ] |
|
Author: {'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}Message: |
| Comment by Githook User [ 24/May/23 ] |
|
Author: {'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}Message: |
| Comment by Githook User [ 19/May/23 ] |
|
Author: {'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}Message: |
| Comment by Yujin Kang Park [ 17/May/23 ] |
|
Requesting backports back to v5.0. I have verified that the bug is possible up to that version. v4.4 and older versions don't have the problematic freeStorage option, and will not deadlock. |
| Comment by Yujin Kang Park [ 17/May/23 ] |
|
Fixed by removing in-progress builds from 'indexFreeStorageSize' in dbStats. Hopefully, long-term |
| Comment by Githook User [ 16/May/23 ] |
|
Author: {'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}Message: |
| Comment by Louis Williams [ 16/May/23 ] |
|
Just a note: this bug requires the caller to pass the freeStorage: true option to dbStats whose default value is 'false'. This is probably an issue that only affects Serverless, because they use this option. |
| Comment by Yujin Kang Park [ 15/May/23 ] |
|
Uploading reproducer: server-77018.repro |
| Comment by Suganthi Mani [ 15/May/23 ] |
|
Reposting my slack comment here about WT intricacies on open cursor.
|
| Comment by Eric Milkie [ 11/May/23 ] |
|
My guess is that this affects 6.0 as well. This is important since we are running the free and shared tiers on 6.0 right now. |
| Comment by Fausto Leyva (Inactive) [ 11/May/23 ] |
|
In both HELP tickets, we encountered this deadlock while on version 6.3. I think it's safe to assume this is possible to hit on 7.0 since the main prerequisite for this deadlock is an index build yielding while holding onto the dhandle (of the index table it is writing to) in exclusive mode. |
| Comment by Josef Ahmad [ 11/May/23 ] |
|
What versions are affected? |