[SERVER-78604] ReshardingCoordinatorService Index build deadlocks with OpObserver Created: 03/Jul/23  Updated: 18/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: cs-subteam1, sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-62720 _configsvrReshardCollection can fail ... Closed
Assigned Teams:
Cluster Scalability
Operating System: ALL
Participants:
Linked BF Score: 157

 Description   

In its _rebuildService, the ReshardingCoordinator tries to build an index. Initialization of the service cannot complete until this index is built. In the related resharding OpObserver onUpdate, we wait for the service to complete initialization, while holding an IX lock. Two-phase indexes require an S lock to complete, so that can result in deadlock if the collection is not empty when the index is built.

This is extremely rare or can't happen under normal circumstances in 7.0 (because either the index exists already or the collection is empty), but if a later revision removes the index, downgrade can deadlock the service.


Generated at Thu Feb 08 06:38:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.