[SERVER-78604] ReshardingCoordinatorService Index build deadlocks with OpObserver Created: 03/Jul/23 Updated: 18/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | cs-subteam1, sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Cluster Scalability
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 157 | ||||||||
| Description |
|
In its _rebuildService, the ReshardingCoordinator tries to build an index. Initialization of the service cannot complete until this index is built. In the related resharding OpObserver onUpdate, we wait for the service to complete initialization, while holding an IX lock. Two-phase indexes require an S lock to complete, so that can result in deadlock if the collection is not empty when the index is built. This is extremely rare or can't happen under normal circumstances in 7.0 (because either the index exists already or the collection is empty), but if a later revision removes the index, downgrade can deadlock the service. |